How to structure EO data for ML workflows
2019-06-25, 09:45–10:15, Room 1

The availability of open Earth observation (EO) data represents an unprecedented resource for many EO applications. The value hidden within an open access satellite imagery can not only be revealed by looking at spatial context but also by taking into account the temporal evolution of a pixel or an area within an image. We found available data structures not best suited for an automatic extraction of complex patterns in such spatio-temporal data. In this talk we will present lessons learned dealing with optical imagery from Sentinel-2 satellite with a five-day global revisit time. The value extraction pipelines relying on other external machine learning and deep learning frameworks are streamlined with the eo-learn library in which an EOPatch plays a central role as a data container.

eo-learn is a collection of open source Python packages that makes extraction of valuable information from satellite imagery as easy as defining a sequence of operations to be performed on satellite imagery. It acts as a bridge between the EO and remote sensing (RS) fields and the Python ecosystem for data science and machine learning, making an easier entry into the field of RS for non-experts and simultaneously bringing the state-of-the-art tools for computer vision, machine learning, and deep learning existing in Python ecosystem to the RS experts.

We will present how we leverage NumPy arrays to store and handle RS data, and GeoPandas for vector and attribute data in data containers called EOPatches, as well as what are the benefits and what problems did we run into with our typical usecases. Comparison with formats, optimised for cloud-based access (e.g. netcdf, cloud-optimised geotiff) will be presented. We will show how Land Cover prediction on a (small) country level can be implemented on your laptop and then scaled to run on a cluster, splitting the area into a grid of EOPatches and using EOExecutor to handle execution and monitoring of the workflow. Future design and improvements of eo-learn, particularly regarding the EOPatch structure, will also be discussed.