GeoPython 2022

A python-based pipeline for large-scale land cover information extraction from cloud-based historical topographic map collections
2022-06-20, 09:45–10:15, Room 1

We leverage open-source python tools to extract historical land cover information (1890-1950) from the United States Geological Survey (USGS) Historical Topographic Map Collection (HTMC). Based on python packages for image processing, machine learning, and geospatial analysis, we extracted historical road networks, urban areas, and forest extents to enhance our knowledge of historical landscape evolution in the United States.


How have our landscapes evolved over the last 120 years? While data on the contemporary and more recent states of the Earth’s surface is available at high spatial, temporal, and semantic resolution, spatially explicit data on land use / land cover prior to the 1980s is rare. To overcome this data gap, we develop methods to extract retrospective geographic information from historical map archives, such as the United States Geological Survey (USGS) Historical Topographic Map Collection (HTMC), holding almost 200,000 individual maps published between 1884 and 2006.
The HTMC is a cloud-based collection of scanned map images including rich metadata, representing the only country-level geographic data resource created by surveying and manual orthophoto interpretation in the era prior to digital cartography and remote sensing. We developed a python-based pipeline that allows for retrieving scanned image files of relevant map sheets, automatically removes irrelevant data (i.e., map collars), aggregates the color information found in the scanned map sheets to a desired target resolution, and generates composites of large amounts of individual map sheets at the country scale. This aggregation step may consist of simple raster resampling but may also involve the encoding of map tiles using texture and feature descriptors. Such a framework enhances the accessibility of cloud-based, georeferenced image sources for computer vision applications and the analysis in local GIS environments. It also enables the seamless integration of information harvested from historical maps with other geospatial datasets, such as earth observation data or road network data.
We call this pipeline the “USGS HTMC map processor” and make the python code publicly available at https://github.com/johannesuhl/mapprocessor. We employed the map processor for different applications to extract historical urban areas, road networks, and forest extents for large parts of the United States between 1890 and 1950, making use of open-source image processing, geospatial analysis, and machine learning python packages. In these applications we used contemporary open geospatial data (e.g., OpenStreetMap, Global Human Settlement Layer, Landfire vegetation data) as auxiliary data. Based on these methodological contributions, we are able to reveal quantitative insight into the long-term land cover changes in the United States.
This effort contributes to the availability of fine-grained, spatial-historical open data by providing a reusable pipeline facilitating the information extraction from historical maps and to digitally preserve the historical geographic information about our past landscapes.