GeoPython 2022

Quantifying agricultural soil carbon stocks at continent scale using a modern Python big data and ML framework
2022-06-20, 16:00–16:30, Room 1

We have developed a Python-based modeling and spatial prediction framework to accurately estimate soil carbon content at large geographic scales. These methods can provide cost-effective carbon accounting for regenerative farming operations.


Adopting regenerative management of farmland has the potential to sequester large amounts of atmospheric carbon in the soil while generating income for growers through the sale of carbon credits. However, a major obstacle is the absence of an accurate, scalable method for verifying the amount of carbon sequestered. Traditional verification methods rely solely on the collection of in-situ soil samples or farm-specific data which is expensive and labor-intensive.
To address this we have developed a spatial modeling framework which leverages large quantities of geophysical and remotely sensed data to accurately estimate the soil carbon stocks across the United States. We leverage a geospatial asset catalog and distributed Python pipeline to curate 20+ data sources, including optical remote sensing, climatological, and geological features which are relevant to soil carbon. Data are joined with thousands of in-situ soil samples collected from 12 US states, and models are trained using Python’s xgboost library. Predictions are converted to the quantity of interest, soil carbon stock, on a site-by-site basis using a Python-based pipeline orchestrated via Apache Airflow. Once validated, the model can be applied with no further in-situ sampling, enabling it to be scaled to large geographic areas. As a result, we have used this framework to create the highest-ever soil organic carbon map spanning the entire continental USA. This talk will discuss the geospatial data architecture, predictive model architecture, and scaled deployment of the model, as well as background on the emerging field of carbon sequestration estimation.