Classification and regression trees: an introduction

Technical Guide: Strengthening the household food security and nutritional aspects of IFAD poverty alleviation projects: developing operational methodologies for project design and monitoring

Household food security (HFS) represents the guiding principle underlying many rural development projects. It plays an important role in the targeting of projects, the selection of appropriate interventions, and the monitoring and evaluation of projects. HFS is a multifaceted concept that does not necessarily lend itself to measurement by single, discrete indicators.

Further, such indicators should reflect the behavior and livelihood conditions of target populations—those that are most often, and more severely, affected by acute food insecurity (Borton and Shoham 1991). These include the rural poor, women-headed households, asset-poor
pastoralists, the landless, recently resettled households, and households constrained by a high dependency ratio.

The multifaceted nature of HFS implies that reliance on a single indicator is unlikely to capture all dimensions of food security. Consequently, Borton and Shoham (1991) suggest 20 core indicators; Frankenberger (1992), and Seaman, Holt, and Allen (1993) each take between 20 and 30 indicators as the starting point; Riely (1993) and Downing (1993) both suggest more than 50 variables; while Currey (1978), one of the earliest practitioners in the field, started with 60
variables for his analysis of vulnerability in Bangladesh. The large number of potential indicators presents development practitioners with several, interlinked analytical problems. First, it is not always clear what criteria should be used to select a set of indicators out of those
available. Second, all other things being equal, there is a strong argument for using as parsimonious a set of variables as possible, but the precise number is difficult to identify in advance. In order to do so, it is necessary to determine which variables are influencing each
other and are therefore not “independent” (additive) indicators of vulnerability. It is also necessary to attach weights to the variables selected as indicators and the existing literature does not provide adequate guidance as to how this should be undertaken. Finally, one would like to have a sense of the predictive value of these indicators.

This guide introduces development practitioners to a statistical software package, Classification and Regression Tree (CART), that addresses these problems. CART is a This guide introduces development practitioners to a statistical software package, Classification and Regression Tree (CART), that addresses these problems. CART is a
3-2 nonparametric technique that can select from among a large number of variables those and their interactions that are most important in determining the outcome variable to be explained. (Two other sets of methods—working closely with local people who can help define indicators of local significance—and parametric methods for choosing outcome indicators of food security are described in Technical Guide #6 and #7, respectively.) In order to illustrate the basic principles of CART methodology, and to demonstrate the power of this methodology, the guide begins with an extended example. It then outlines reviews a number of technical details, including the hardware and software requirements and how to program in CART. The concluding section outlines additional applications as well as describing the strengths and weaknesses of CART methodology. Appendix 1 discusses in more detail how CART constructs a classification tree and Appendix 2 provides an annotated guide to a sample of CART output.

Yohannes, Yisehac
Hoddinott, John
Published date: 
International Food Policy Research Institute (IFPRI)
PDF file: 
application/pdf icon