The ethical side of big data

/ Author: Masja de Ree
© Hollandse Hoogte / Cultura Images RF
How can I process the data I collect about citizens both safely and with integrity? How can I present my conclusions honestly? How can I best deal with the responsibility that comes with possessing large amounts of data? These are essential questions for data scientists and statisticians. This is the reason for the emphasis on ethics during the National Data Science Programme of the Ministry of the Interior and Kingdom Relations (BZK) and Statistics Netherlands (CBS).

Learning about data science

The National Data Science Programme is an initiative of the Ministry of the Interior and Kingdom Relations (BZK) in association with CBS. It ensures 35 recent graduates are given the opportunity to work at different government ministries for a period of two years. They are also required to attend a CBS training course, which focuses on honing the graduates’ knowledge and skills in the field of data science. Some of these graduates spend their time working on CBS projects. Prof. Bart Bakker, who is responsible for the National Data Science Programme curriculum at CBS, says: ‘Ethical data use is a critical issue, both for CBS and for science in general. It greatly reduces the risk of scientific mishaps, which have the potential to damage the scientific community’s reputation. The same can be said about official statistics. The main objective of the ethics module in the National Data Science Programme and the CBS professional traineeship is to raise awareness about this issue. Ethics ranks high on our list of priorities and is important to everyone working at CBS.’

Francien Dechesne
© Sjoerd van der Hucht Fotografie

Ethics module

‘Data permeates every aspect of our lives,’ says Francien Dechesne, who teaches ethics in the National Data Science Programme and works for the Center for Law and Digital Technologies at the Leiden University. ‘It crops up when trying to find a partner via a dating app, when searching for information on the internet and when communicating with the government or the commercial sector. It is evident that working with data gives individuals and organisations a certain amount of power.’ Data technology is undergoing rapid development, and this raises many important issues for CBS, as well as for other government agencies that process large quantities of data.

‘You have a certain professional responsibility as a scientist or statistician’


If you base decisions or conclusions on big data, then there is always a chance of unfairness seeping in. Dechesne illustrates this point with the following example: ‘Suppose you have a pile of CVs to sort through and you decide to use an algorithm to help you find the right candidate for a CEO position. You might use an algorithm that analyses big data on the basis of CEOs who were successful in the past. The consequences of doing this become apparent when searching for images of CEOs on Google: you predominantly see men in suits. This example demonstrates how inequality in the past leads to inequality in the present, which in turn leads to inequality in the future.’


All governments agencies are required to follow certain guidelines in their line of work. CBS, for instance, is not permitted to use profiling or tracking methods, whereas the police have other guidelines they need to comply with. Staff members must comply with set guidelines when working for these types of organisations. In fact, many of these guidelines have been laid down by law. Dechesne continues: ‘You also have a certain professional responsibility as a scientist or statistician. This means being thorough when describing what your graph or model reveals or does not reveal, and being clear about what types of conclusions can be drawn from this.’ Be mindful about how you present your information and how this ties in with your research question. For instance, enlarging a graph’s y-axis can distort the information presented, making a slight increase appear significantly larger than it is. In such a case, the research question determines whether the information is being conveyed correctly or not.’ 


The power of data

Why do we need to highlight the importance of ethical data use? Dechesne explains: ‘I am a mathematician. My world is a world of numbers. My education did not put much emphasis on the power of data in our society, however. Numbers frequently have a veneer of objectivity, but any conclusions drawn on the basis of data are always contingent on the definitions maintained and the decisions made when designing a research project. These choices can have a huge impact on certain groups in our society. This is something we need to be aware of. Decisions have to be made. That is fine, of course, as long as everyone is mindful and transparent when making decisions.’