IFPRI Blog : Research Post

1.15 Million Children, 122,472 communities, 57 countries, 30 years: New dataset combines Demographic and Health Surveys with geographic information systems

February 22, 2021
by Zhe Guo,
Wahid Quabili,
Liangzhi You and
Derek Headey
Open Access | CC-BY-4.0

Research on the linkages between agriculture, nutrition, and health is hugely hindered by a lack of data: Health and nutrition surveys generally contain little information on agriculture, and agricultural surveys little information on health or nutrition. Yet we know that there are hugely important connections between these different domains, as well as with related issues like climate, infrastructure, and various ecological conditions.

To close these data gaps, the Advancing Research on Nutrition and Agriculture (ARENA) project—funded by the Bill & Melinda Gates Foundation and the IFPRI-led CGIAR Research Program on Agriculture for Nutrition and Health (A4NH)—constructed a large multi-level, multi-country dataset combining individual and household level nutrition, health, and socioeconomic data from the Demographic and Health Surveys with a wide variety of community-level geographic information systems (GIS) data on agricultural production, agroecology, climate, demography, and infrastructure.

The scope of this dataset is massive and it is a useful tool to monitor and evaluate population, health, and nutrition studies, especially agriculture-nutrition linkages at large scales. The dataset includes 1.15 million children, 764,000 mothers, and 122,472 rural and urban communities across 57 countries from 1990-2019 (see map below).

The spatial variables in the DHS-GIS database are grouped into four data categories: Agriculture, Agroecology, Demographics, and Markets. Each category includes a number of sub-categories with a large number of data layers in sub-groups. 

DHS has collected and disseminated accurate and representative data through hundreds of highly standardized surveys for several decades. These are particularly strong on nutrition, health, demographics, and gender, but also contain very valuable information on household wealth (consumer durables, housing characteristics), access to services (water, sanitation, health facilities), and education. However, they are weak on specific agricultural activities (except for basic occupational classifications), as well as infrastructure.

Using the location information of the surveyed clusters from DHS, the ARENA team worked on linking DHS data with spatially explicit data such as population, biophysical conditions (rainfall, temperature, soil quality, topography, and water body access), agricultural production estimates, transport infrastructure, and night lighting, all through geospatial data processing. Integrating GIS/spatial measures on agriculture, biophysical conditions, and infrastructure with DHS datasets produces a more comprehensive set of variables for a large number of countries. This could facilitate both cross-country and country-level research on a vast array of topics. Some examples of research using these datasets can be found here.

While DHS data has been extensively linked to GIS indicators before (see the DHS GIS webpage), doing so is not straightforward, as the precise locations of DHS clusters of survey participants are displaced randomly by as much as 10 km in order to protect confidentiality.

For the ARENA DHS-GIS dataset, we used a series of geoprocessing procedures to address this issue. The impact of cluster point displacements can be moderated through the generation of covariates representing average values from neighborhood buffers. Since the ranges of the displacement between rural clusters and urban clusters are different, we used standardized, area-weighted buffer zones with a radius of 10 km for rural clusters and a radius of 5 km for urban clusters. This approach reduces measurement error, which can be critical for assessing linkages between household and community level indicators.

The GIS data are hosted on the Harvard University Dataverse website, and includes instructions for merging GIS data with publicly accessible Demographic and Health Surveys data. We hope that this combined dataset will constitute a powerful tool for advancing research on a wide range of important issues in developing countries.

Zhe Guo is a Senior GIS Coordinator with IFPRI's Environment and Production Technology Division (EPDT); Wahid Quabili is a Senior Research Analyst with IFPRI's Poverty, Health, and Nutrition Division (PHND); Liangzhi You is an EPTD Senior Research Fellow; Derek Headey is a PHND Senior Research Fellow.

File Attachments
Attachment Size
dhs_map.png 659.06 KB