Plos 1 paper using satellite images and machine learning to predict deprivation

Published on

plos1
Plos 1 paper using satellite images and machine learning to predict deprivation

Last week, PLoS ONE published our paper using satellite imagery and machine learning techniques to predict levels of Living Environment Deprivation. You can check it out here in open access, as the journal takes the P in PLoS (i.e. Public) very seriously. In a companion resource, we also published the data, the computational environment, and the code (Python of course!) used to generate the results in the paper. You can access the open repository here.

A bit about what we do before I jump into why I’m so excited about this paper. We consider Living Environment Deprivation (LED), one component of the popular Index of Multiple Deprivation, and try to predict it using only information extracted from satellite images, such as those you can access through Google Maps (in fact, that was our original source). Our experiment uses data for Liverpool (UK) as a case study. To do this, we combine and compare a series of computational techniques that allow us to, first, turn images into numbers that describe them and, second, to relate such measures to levels of LED. One of the main selling points of the paper is that we introduce in this literature several ingredients of modern data science and show that they significantly improve prediction power and evaluation when compared to more traditional approaches adopted in previous papers.

Now, why? Why do we go through the effort of collecting, processing and modelling data from satellites, if what we want to predict actually exists already (and it’s readily available for download!)? There are two main reasons why we think this is relevant:

  1. We show machine learning is a big deal when it comes to predicting LED from the sky. Precisely because we have good “ground truth” data, our experiment provides an excellent benchmark to show the advantages of these techniques. Furthermore, since these methods are very flexible and not tied to our specific dataset or location, our results feed directly into researchers who use satellite data to derive socio-economic indicators in areas of the world where there is not good ground truth data.
  2. Our paper helps pave the way for a world of continuous updates of socio-economic data. The IMD is produced at irregular and low frequency. Arguably, there is a lot of interesting patterns and developments that go completely missed because they occur in-between releases. We are more and more moving towards a world in which satellite imagery is abundant, (very) frequent, and (relatively) cheap. If we can devise a way of producing more regular updates of the IMD in a cheap fashion, we are on to something really interesting; and not only for machine learning/data geeks (like me!) but also for a wide range of researchers, practitioners and policy makers interested in deprivation. Our paper by no means fully solves this problem, but represents a significant step forward.

Finally, two broader reasons why I got interested in this project and why I think the paper is particularly timely. First is that I see satellites, and other forms of automated remotely sensed data, as the new frontier for the Social Sciences. For several reasons, social scientists have been pretty unaware of this data source but I think this is about to change very rapidly. We are not going to get more frequent censuses and probably not going to find ways to radically decrease the cost of running bespoke surveys. However, we are going to collect better and more frequent satellite data. If we want to take seriously many of the challenges and questions we, as social scientists, are concerned with (i.e. social inequalities, economic disparities), we cannot afford ignoring these sources any longer. Second, and related to my previous point, if we are going to engage with these “new” forms of data, we as a discipline need to “update” much of our methodological toolbox. There is a lot in domains like machine learning and computer science that can be useful in our research activities and there is really very little excuse not to engage with them.

So, start by downloading and reading our paper, sure. But don’t stop there, there is a brand new world out there waiting to be discovered, and you don’t want to be the last one to the party, do you?