GeoPython 2021

Universal geospatial data storage with TileDB: No more file formats
2021-04-22, 15:00–15:30, Track 2

This talk will describe the open-source TileDB Embedded library and its integrations in the geospatial domain. We will give examples of its use for point clouds, SAR and weather with partners such as Capella Space and exactEarth, and emphasize on the need to depart from file formats and focus on universal, end-to-end solutions instead.


TileDB Embedded is an open-source, universal storage engine with integrations into many tools that already exist within Python such as Dask, xarray and Pandas as well as geospatial specific frameworks such Rasterio, Python-PDAL, GeoPandas and our own open-source library for netCDF and HDF-like data, TileDB-CF-Py.

TileDB Embedded is ideal for geospatial data as it is based on sparse and dense multi-dimensional arrays, implementing indexes such as R-trees and Hilbert curve orderings. It is cloud-native and can encompass multiple geospatial domains. I’ll make the case for universal geospatial data storage in the following parts:

Analysis-ready geospatial data

No more files. A universal format can cover all geospatial data types as sparse or dense arrays allowing rapid slicing, arbitrary metadata, and with versioning and time-traveling built-in. Here, I’ll examine the shared structure of geospatial data that makes it best suited for array-based storage.

Superior interoperability

The tools don’t change. I’ll look at how TileDB arrays work within the Python ecosystem. Leverage PDAL, GDAL and existing tools such as Dask, xarray and Pandas to perform geospatial analysis.

Solution focused

We focus on end-to-end solutions, not format standards. Despite defining a powerful open-spec data format for all geospatial data, our goal is to deliver unprecedented speed for analytics queries and integration with numerous computational tools, via the well-defined APIs of our TileDB Embedded storage engine.

Proven

TileDB Embedded is successfully used by high-profile users and customers to store SAR, hyperspectral, weather, seismic data as well as point cloud data from SONAR and LiDAR sensors, all within a universal data engine that can be used seamlessly from Python. The talk concludes with a co-presented example from TileDB user Capella Space.