26-023 Sensor-agnostic fusion and representation learning for Satellite Images

  • Ph.D., 36 months
  • Full-time
  • Experience: no preference
  • MBA
  • Digital technologies for remote sensing

Mission

Global coverage of the Earth surface with Satellite Image Time Series (SITS) is a critical component of the continuous monitoring of our planet Essential Climate Variables and Essential Biodiversity Variables. The Landsat series alone provides 50 years of such continuous monitoring, and for ten years now Sentinel-2 has complemented the Landsat archive with higher revisit, higher spatial resolution images. Spatial and temporal resolutions of SITS are important parameters that define the reachable scale of analysis both in space and time. The revisit time is even more critical for passive optical sensors, where cloud occurrences obliterate a vast proportion of the observations. On the other hand, there is an increasing number of orbiting global coverage and systematic revisit sensors that could complement each other. Flying missions include Landsat-8/9, Sentinel-2, and PlanetScope, while up-coming missions such as Trishna or Land Surface Temperature Monitoring (LSTM) will complement them by 2030. There is therefore a growing need for methods that can jointly leverage those continuous sources of Earth monitoring.

Fusing information of multiple SITS from different sensors can be seen as an image fusion problem, resulting in an ideal SITS with improved spatio-spectro-temporal resolution. In 2025, Michel et. al proposed an original complete formulation of the SITS fusion problem, for which they proposed a spatial residual Convolutional Neural Network and Temporal Attention Network model called Temporal Attention Multi-Resolution Fusion (TAMRF). This model is trained by means of an advanced Self-Supervised Learning (SSL) strategy, and original loss terms favoring spatial-resolution and robustness to input invalid data such as cloudy pixels. This results in a versatile model that can process SITS of any length and provide all input spectral bands at the

best spatial resolution and any requested acquisition times.

Another way to benefit from several complementary sensors is the Foundation Model (FM) paradigm, which directly extracts joint-features from a set of input SITS from different sensors by means of a large Deep-Learning method. These features can then be mapped to the target variable by a shallow classification or regression head. FM are still limited in terms of supported sensors and their superiority with respect to ad-hoc models is still debatable. An interesting in-between is explored in the RELEO (REpresentation Learning for Earth Observation, see https://groupes.renater.fr/wiki/envia/_media/pres_releo_aniti_svalero_ai4env_4sept25.pdf) project. The aim of RELEO is to aggregate every possible observation for a given time-span and location into a blob of features that can be directly decoded to forecasts of ECV and EBV. 

The whole system is self-supervised by using radiative transfer models that map back the ECV and EBV to the input data, following the work of Zerah et. al. A cornerstone of RELEO is the use of the Perceiver-IO architecture, a fully attention-based architecture that removes all inductive biases on data by considering input SITS as sequence of individual measurements at given spatial location, spatial resolution, spectral bandwidth and acquisition time. This paradigm allows Perceiver-IO to naturally integrate variable input data bulks comprising images of different spatial resolution, size, SITS of different lengths, and even measurements that do not come from images (such as IceSat spots for instance). While RELEO is still in an early stage of development, promising results have been obtained in the frame of two different ongoing Ph.D. thesis in the project.

In this context, this Ph.D. aims at bridging the gap between the TAMRF model and the RELEO model. The objectives of the Ph.D. are as follows:

1) Extend the TAMRF architecture and training toward a fully attention-based  architecture using Perciever-IO: support additional sensors such as MODIS or Sentinel-3, generalize the spatial resolution and spectral representations of inputs and outputs intoa single encoder and decoder for all sensors, and add conditioning by exogeneous variables in encoder and decoder.

2) Contribute to define and implement the training strategy of RELEO by providing complex pretext tasks inspired from TAMRF, and contribute to validate and extend the RELEO encoder and decoder architecture (especially in meteorological conditioning) based on the findings on TAMRF.

The expected outcomes of the Ph.D. are:

- A well validated foundation model for the fusion of any set of sensors,

- Several contributions to the RELEO project in terms of training strategies and architectures.

=================

For more Information about the topics and the co-financial partner (found by the lab!); contact Directeur de thèse - julien.michel@cnes.fr

Then, prepare a resume, a recent transcript and a reference letter from your M2 supervisor/ engineering school director and you will be ready to apply online  before March 13th, 2026 Midnight Paris time!

Profile

The candidate should have a strong background in several of the following subjects: Python scientific programming (pytorch, ligthning, hydra),  Machine learning and Deep Learning, Applied Mathematics, Physics.

Laboratoire

CESBIO

Message from PhD Team

More details on CNES website : https://cnes.fr/fr/theses-post-doctorats