26-028 Embedded Attention : Vision Transformers & Micro-LLMs for Space Systems

Post-doctorat, 24 mois
Temps plein
Moins de 2 ans d’expérience
Doctorat, Bac+8
Digital technologies for remote sensing

Mission

Your application must include a recommendation letter from your Ph.D. supervisor, a detailed CV including university education and work experience, a list of publications, a 2-page description of the work undertaken during the course of your PhD.

For more Information, contact : Directeur de Recherche - guillaume.oller@irt-saintexupery.com

Submit the complete application online (Apply) before March 13th, 2026 Midnight Paris time

===================

Attention has become a core building block of modern AI across language, vision, and generative models. In imaging, Vision Transformers (ViT) demonstrated that a “pure” transformer on image patches can match or surpass convolutional neural networks (CNNs) when pre-trained at scale, making attention a unifying alternative to convolution and applicable to tasks such as classification, detection, and segmentation [1]. This project aims to adapt attention-based modules for embedded architectures destined for space deployment, where power and memory are tightly constrained. By porting attention, compatibility is preserved with future attention-centric backbones for image processing and, eventually, other on-board applications such as large language models (LLMs).

Enabling attention on board extends EO processing beyond CNN-only models. ViTs can be used for tasks such as object detection, scene classification, or event segmentation, reducing downlink needs and prioritizing valuable data. Compact LLMs can further support spacecraft operations by summarizing telemetry, prioritizing events (e.g., acquisitions, downlink queues), and highlighting anomalies. The benefits are smaller data volumes, faster decisions between ground contacts, and increased autonomy under degraded communications.

Deploying attention in this context remains difficult. Standard attention has quadratic cost in sequence length (or number of patches) and is often constrained by heavy memory use, challenging strict latency and energy budgets. Current embedded toolchains also do not always support the operators required to exploit recent advances, limiting practical efficiency. Several strategies address these issues. IO-aware exact attention, such as FlashAttention, streams Q/K/V blocks through fast memory without materializing the full score matrix, reducing off-chip transfers and improving throughput [2]. Linear or approximate alternatives such as Performer replace softmax with kernel features (FAVOR+), achieving linear complexity with acceptable accuracy trade-offs [3]. For LLMs, activation-aware weight-only quantization (AWQ) compresses weights to 4-bit precision with limited accuracy loss, making micro-LLMs feasible on constrained hardware [4]. To ease memory pressure, PagedAttention introduces page-level allocation and reuse of the K/V cache, enabling longer contexts under tight RAM budgets [5]. Together, these methods—FlashAttention or windowed attention for ViTs, linear attention where suitable, AWQ for LLMs, and paged or quantized K/V caches—provide a practical path toward space-grade deployment.

------------------------

The proposed work plan is as follows:

• State of the art. Survey barriers to embedded attention: unsupported operators, RAM/bandwidth pressure, and limited runtime optimizations. Catalog acceleration methods—windowed/local attention, quantization (fp8/int8/int4), IO-aware attention (FlashAttention), and linear/approximate attention (Performer)—along with deployment chains (TensorRT-LLM, Vitis AI, ONNX Runtime, Arm Compute Library, Apache TVM) and existing attention models on similar hardware.

• Space use cases & model selection. Select space-relevant cases: an embedded ViT for EO applications (e.g. object detection, scene classification, or event segmentation) and, optionally, a micro-LLM for telemetry (summarization, downlink prioritization). Implement optimizations consistent with mission constraints and toolchain support.

• Implementation & porting. Map the models to embedded targets (Jetson Orin GPU, Versal FPGA, Arm CPU) and adapt them for compatibility with each platform. Align operator sets, tensor layouts, and precisions with toolchains, adding only minimal plugins or kernels where required.

• Evaluation & validation. Measure both algorithmic and hardware performance, focusing on RAM usage, power consumption, throughput, and latency. Assess utility (mIoU/F1 for EO; precision/recall for telemetry triage) and compare against CNN or non-attention baselines. Conduct ablations to evaluate the impact of each optimization technique.

• Demonstration & dissemination. Present results through scientific conferences and peer-reviewed publications. If possible, extend this valorization with an in-orbit demonstration (e.g., OPS-SAT 2), showcasing the feasibility of embedded attention under real space conditions.

------------------------

References:

[1] Dosovitskiy, A., et al. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv:2010.11929 (2020).

[2] Dao, T., et al. FlashAttention: Fast and memory-efficient exact attention with IO-awareness. NeurIPS 35 (2022).

[3] Choromanski, K., et al. Rethinking attention with performers. arXiv:2009.14794 (2020).

[4] Lin, J., et al. AWQ: Activation-aware weight quantization for on-device LLM compression and acceleration. MLSys 6 (2024).

[5] Kwon, W., et al. Efficient memory management for large language model serving with paged attention. SOSP (2023).

Profil

PhD holder in Embedded Systems, Artificial Intelligence, or Computer Vision. Experience in Earth Observation image processing is a valuable asset.