Linden McBride is a PhD student at Cornell’s Dyson School
Accurate targeting is crucial to the success of food security or social safety net interventions. To achieve accurate targeting, project implementers seek to minimize rates of leakage (benefits reaching those who don’t need them) and undercoverage (benefits not reaching those who do need them). As already noted on this blog, minimizing these errors is not an easy task.
One approach, conducting full means tests for identification of project beneficiaries, requires long and detailed surveys on household consumption and expenditures, a time consuming and expensive option. A short-cut, called proxy means tests (PMTs), developed for the targeting of social programs in Latin American countries during the 1980s, uses a small number of easily verifiable household characteristics.
PMT tools assign weights to a short list of household characteristics, using either regression or principal components analysis, by comparing the best fit of a PMT to the full means test in a nationally representative data set. PMTs have become common tools for targeting and poverty assessment where full means tests are costly. Today they are used by USAID microenterprise project implementing partners, the World Food Program, and the World Bank, among many others, for poverty assessment, beneficiary targeting, and program monitoring and evaluation in developing countries.
Once weights have been generated for a set of observable household characteristics that can account for a substantial amount of the variation in household wealth or welfare, the development practitioner can apply the PMT tool to a sub-population selected for intervention to rank or classify households according to PMT score.
This process involves implementation of a brief household survey to the targeted subpopulation to gather the household characteristics needed for the PMT tool. The observed household characteristics are then multiplied by the PMT tool weights to generate a PMT score for each household. In many applications, the calculated PMT scores are used to rank households from poorest to wealthiest, and the poorest households are selected as program beneficiaries.
Overall, the objective of PMT tools is to quickly and affordably identify households meeting particular criteria in a new setting (but under the same data generating process) using a model parameterized with previously available data. Therefore, for PMT tools to serve their purpose, it is important that they perform well not only within the data set or sample in which they were parameterized but also, and especially, within the new data set or sample.
In other words, high out-of-sample prediction accuracy must be prioritized in the development of PMT tools. In the fields of machine learning and predictive analytics, stochastic ensemble methods have been shown to perform very well out-of-sample due to the bias and variance reducing features of such methods.
In our paper, Improved poverty targeting through machine learning: An application to the USAID Poverty Assessment Tools**, Austin Nichols and I present evidence that the application of machine learning methods to PMT development can substantially improve the out-of-sample performance of these targeting tools. We illustrate the potential of machine learning algorithms for PMT tool development by applying stochastic ensemble algorithms such as random forests to a set of PMT tools that have been developed by the University of Maryland IRIS Center for the purpose of USAID poverty assessment. We find gains in poverty targeting accuracy from 2 to 18 percent across countries. Monday’s post will detail the methods and findings of the paper.
**This paper has been revised and published under a new title, Re-tooling poverty targeting using out-of-sample validation and machine learning.