We produce data every second of the day. From Paris to Dakar, from Jakarta to New York, our daily activities (consume, communicate, travel…) generate data, “digital crumbs” that we leave behind us. This information is potentially useful for development. How? An interview with Thomas Roca, economist at AFD.

ESRI User Conference 2013 © kris krüg
ESRI User Conference 2013 © kris krüg

What is big data?

The notion of big data covers a set of heterogeneous – if not disparate – data. They are usually described as the “3Vs” for “Velocity” (high frequency of updates), “Variety” (images, mobile phone data, data from sensors, texts, etc.) and “Volume”, as the mass of information that results from them is considerable. However, this description does not take the person behind the data into consideration. It does not deal with their impact on the organization of our societies.

 

What are the challenges posed by these new types of data?

The challenges posed by turning the world into data must not be considered as purely technical issues. These challenges are also political and ethical. Who does the data from our mobile phones or from our activity on social networks belong to? How to protect the privacy of citizens? How to regulate their uses?

The fact that we cannot know in advance what use will be made of our private data poses an ethical problem. If they are sold to a foreign security service, are we informed? Can we refuse? Today, the answer is no, because to use social networks, each user gives their “informed consent” to their personal data being reused, without knowing what use will be made of them in the future. Indeed, very often the operator itself does not know! When Facebook was created, it was far from imagining the commercial interest that the information it was going to collect would have one day. Consequently, without it being its initial objective, it has nevertheless created a new business model.

As is the case with bioethics, it is necessary to define rules, at international level, in order to define “data ethics”. Establishing “overall consensus for data” is a complicated issue due to their diversity, the quasi-monopoly of the private sector in their collection, but also due to their strategic importance in a dematerialized economy.

Until now, national – or European – regulations have defined certain rules that more or less effectively protect citizens. However, overly stringent rules can ultimately hinder innovation and the use of these data for public policies. Where should the cursor be placed?

To date, there has been no institutional response that would allow a private data platform to be created. It would be open, but protected (anonymized, with restricted access, etc.) and could be used for the definition of public policies. This type of project is, however, being discussed at the UN and World Bank. On the corporate side, Orange is an extremely active actor in these discussions. Following the success of the Data for Development challenges,[1] Orange wishes to continue its efforts and encourage other private companies to join it in a “controlled provision” of certain data that are useful for development. The Data for Climate Action project could be among those that break new ground, if a sufficient number of companies manage to mobilize to make new data available in order to better understand the impact of climate change and environmental changes on populations.

 

What role can big data play in the social sciences?

These new types of data provide a different vision of the world, which is complementary to the one provided by existing statistics. To date, the data used in social sciences have been “built” following a process of collection from observations or questionnaires. In the era of big data, the data are mainly “issued”.

The use of these data in social sciences is not so simple, nor always appropriate. It raises a number of questions. Firstly, that of their validity. The data traditionally used are the result of a theoretical construction: What do we wish to measure? How to capture information? With big data, the problem is posed the other way round: What data do we have? What can we do with it?

In practice, the distinction is more tenuous. In reality, only a minority of social science researchers can afford to create a specific database. Most researchers broadly ask the same questions: What data do we have? What can we do with them? How to treat them adequately?

Certain massive data have specific problems: they may, in reality, be partial – as everything is not quantifiable – and sometimes biased. They do not necessarily reflect the activities of the least connected share of the population, and the poorest are often under-represented. This can be the case with data from NICTs (mobile phones, social networks, etc.).

 

 What future can the use of these data have for development?

We are going through a research and experimentation phase, and it will still take time for the widespread use of big data for official statistics. Two cultures, two generations, are facing each other: the statisticians and the data scientists. They use a language and tools that are sometimes different. The statisticians have been trained in statistics and probabilities. The data scientists come from the world of computing and treat data sets that are sometimes so considerable that inferential statistics[2] and the notion of sampling appear to be outdated to them. The statisticians live in the “long term” of national accounts, the data scientists in the immediacy of Internet… In the debates over the measurement of the Sustainable Development Goals there is an opposition, but also a link, between these two visions of turning the world into data.

Big data obviously do not provide a miracle solution to the lack of financial and human capacities facing National Statistics Institutes in the poorest countries. However, some of these data may be useful for addressing specific problems. Let us take mobile phone data as an example. They are particularly relevant when it comes to gaining an understanding of mobility and monitoring population movements. Analyzing them consequently makes it possible to optimize public transport routes, regulate road traffic, facilitate urban planning, etc. We can also mention water and electricity consumption, the famous “smart grids” which can, thanks to sensors on the grid, improve the management of electricity flows (power output, distribution, etc.).

If the private sector is already taking advantage of these data, it is because it has a command of their production process. In the case of public policies, the issues are raised of the protection of privacy and the strategic interest of some of these data for the private sector, particularly when they concern the core business of these companies. All data are not, however, strategic. Orange recently mentioned the case of the weather sensors that are on its relay antennas. They are used to analyze the quality of the air and therefore its capacity to carry signals. It is certainly feasible to make this type of information available.

Today, we hope that new types of partnership, Public-Private-Individuals-Partnerships, will be established, which can advance the use of these data to serve the poorest populations. As Amina Mohammed, the UN Secretary-General’s Special Adviser, stated, every day lives are lost because they have not been counted.

 

 

[1] which pose a challenge to researchers based on the use of mobile phone data for the formulation of public policies in Côte d’Ivoire (2013) and Senegal (2014).
[2] Inferential statistics means applying characteristics observed within a representative sample to an entire population – with a certain margin of error.

I subscribe to the ID4D newsletter

Once a week, I receive the latest blog posts!

Agenda