GeoPython2019

Automating the definition and optimization of census sampling areas
2019-06-25, 17:00–17:30, Room 1

Traditionally, census sampling area definition is done by manually digitising small geographic units on high-resolution satellite imagery or by physically walking the boundaries of sample areas; methods which are highly time, cost and labour intensive. This presentation focuses on a method implemented in Python to automate the definition and optimization of census sampling areas using digitized natural and man-made geographic features and high-resolution gridded population estimates.


Creating and updating census enumeration areas (EAs) is one of the most challenging, but essential, tasks in the preparation of a national census. Traditionally, this is done by manually digitising small geographic units on high-resolution satellite imagery or by physically walking the boundaries of EAs; techniques that are highly time, cost and labour intensive. In addition, the task of creating EAs requires taking into account population and area size within each unit, whilst ensuring a human can easily identify the boundaries of a unit on the ground when they are collecting data. This is an optimisation problem that can best be solved by a computer. To respond to this challenge we have developed an automatic technique to define census EAs and national population sampling frames, which can be customized by governments to meet their data collection criteria. This technique is based on high-resolution gridded population and settlement datasets and uses publicly available natural, man-made and administrative boundaries, alongside geospatial processing techniques and graph theory implementations. The result is a national sampling frame, with a whole country divided up into regions that follow geographic features on the ground and meet the government’s sampling criteria.