Translating data into high-quality statistics
Creating official, high-quality statistics based on big data and register data is not a simple matter. The data sources were not designed for statistical use and safeguarding the quality and continuity is quite complex. Therefore, the ultimate challenge for data scientists is to develop methods that ‘translate’ huge amounts of data into high-quality statistics. CBS pursues this challenge with methods such as machine learning. As CBS data scientist Marc Ponsen explains, ‘Machine learning has received an enormous boost by faster computers and the huge amounts of data that have become available, although what the best possible method is very much depends on the domain under investigation.’
Ponsen has worked at the Center for Big Data Statistics (CBDS) since May 2018. ‘Within this CBS division, over 40 national and international parties from the public and private sector and the fields of science and education are collaborating in the areas of big data technology and methods for official statistics production. The CBDS was established in response to important developments in society: the ever-increasing demand for real-time and regional statistics (urban data, ed.). This necessitates the use of new data sources.’
Various topics and statistics
Ponsen studied computer science at Delft University of Technology (TU Delft) and did research at Lehigh University in Pennsylvania and Maastricht University. He developed artificial intelligence for commercial computer games as part of his graduation research. After graduation, he worked for the Dutch Financial Markets Authority (AFM), primarily developing data visualisations. He combines his current work at CBS with his job as a data analyst at Eindhoven-based football club PSV, where he advises the technical staff. Why, as a data scientist, did he choose CBS? ‘CBS is working on a wide variety of topics to produce statistics on them. It is this variety that makes it interesting,’ Ponsen says. ‘Machine learning techniques are implemented depending on the topic and the type of statistic. It is the combination that makes this job so appealing.’
The social relevance is a key reason for these data scientists to do research at CBS
Tim de Jong studied Knowledge Engineering at Maastricht University and obtained his master’s degree in artificial intelligence. He started his career at CBS as a software engineer and switched to data science last year. Together with Ben Laevens, among others, he is involved in a project in which big data have to be translated into a statistic about all solar energy generated in the Netherlands. De Jong’s project was commissioned by Eurostat, while the project with Ben Laevens and his two colleagues is an assignment from CBDS. Laevens obtained his master’s degree in physics and astronomy at the University of Edinburgh, his doctoral degree at the Université de Strasbourg and the Max Planck Institut für Astronomie, and conducted his postdoctoral research in Santiago (Chile). He is currently working as a researcher at the Dutch Ministry of Economic Affairs and Climate Policy (EZK) and is training to be a data scientist at CBS. ‘This means two years of work on topics that are closely related to the policy areas of the Ministries of EZK and Agriculture, Nature and Food Quality, namely energy and economy. It is a mixture of research and training. Upon completion, I will move on to perform assignments in data science at the two ministries.’
Several large data sources are used in the project including register data on solar panels, radiation data from the Royal Netherlands Meteorological Institute (KNMI) and data from aerial photographs. ‘The biggest challenge here is to develop algorithms by using machine learning techniques that are capable of generalising effectively,’ De Jong says. Laevens: ‘In this way, we can eventually come up with a model that is able to determine the generated solar capacity in the Netherlands (taking into account factors such as the weather, the season and the location of the panels). It’s knowledge of major importance, to society as well as to policymakers.’ The latter in particular – the social relevance of work such as this – is an important reason for both data scientists to do research at CBS. ‘Here, you get to work with one of the most extensive databases in the Netherlands, in a creative and open working environment. That is very inspiring,’ is De Jong’s conclusion.