GeoPython 2021

Geopythonic processing of massive high resolution Copernicus Sentinel data streams on cloud infrastructure
2021-04-22, 09:15–09:45, Track 1

We demonstrate the use of geopython solutions to address Big Data Analytics requirements in cloud-based processing of massive high resolution Copernicus Sentinel data streams in a European agricultural use context.


The European Union's Copernicus Sentinel sensors produce large volume Earth Observation data streams, which are available under a full, free and open license. The Copernicus program also supports the establishment of Data and Information Access Services (DIAS) cloud-based processing solutions, some of which are federated in the European Open Science Cloud (EOSC). DIAS platforms closely couple the provision of compute resources with access to very large S3 object storage for data Sentinel archives, which include high resolution Sentinel-1 and -2 sensor data (10 m resolution), with high revisit (5-6 days) and continental coverage.

We demonstrate how we use a combination of geopython modules (GDAL, rasterio, geopandas) with PostgreSQL/Postgis spatial databases to manage the processing of deep time series data stacks with very large vector data sets that outline agricultural parcels in selected EU Member States. Accelerated processing is supported by integration of Numba and orchestration across multiple VMs on the cloud platform using customized Docker containers. Our client interfaces make use of Flask RESTful services and Jupyter Notebooks to support analytical tasks, which can include scipy based image analysis. Time series can also be integrated into machine learning frameworks like TensorFlow and PyTorch. We will demonstrate how our modular set up facilitates the use in monitoring tasks that are required in the Common Agricultural Policy context.

In the course of our presentation, we'll outline specific processing needs, and how we intend to integrate more advanced hardware solutions, such as GPU-based processing (in cupy, Numba), which is still surprisingly sparsely used in the geospatial domain. The relevance of our initiative in the context of European programs such as Destination Earth and European Data Spaces will be shortly addressed as well. Finally, we'll introduce the public github repository where we document our current and ongoing developments.