New methods and sources for Big Data research

/ Author: Miriam van der Sangen
© Marcel van Hoorn
In October 2018, the seminar ‘Methods for Big Data in Official Statistics’ was held on the Brightlands Smart Services Campus in Heerlen. The objective of the seminar was to bring together researchers from statistical offices and academic scientists in order to exchange knowledge and present the latest methods and techniques in the field of Big Data. In addition, experiences were shared in terms of the use of new data sources and the associated methodological challenges. The seminar was organised by Statistics Netherlands (CBS).

Opportunities and challenges

In September 2017, CBS held the seminar ‘Big Data Matters’ to mark the first anniversary of its Center for Big Data Statistics (CBDS). The seminar included discussions on the opportunities, challenges and issues arising from working with Big Data. ‘This year’s seminar was different in that it focused primarily on new techniques and methods for Big Data research,’ says Sofie De Broe, who heads the methodology department of CBS in Heerlen and is also scientific director of the CBDS. De Broe was one of the organisers of the seminar while also acting as its chair. According to her, Big Data offer a wealth of opportunities, but are also posing major challenges. ‘There are big plans, for example in the area of linking and sharing of data. We are legally allowed to undertake research on personal data for scientific or statistical research. However, the sharing of personal data in such research is still an issue. We are currently examining how to share data in a secure environment by using methods such as ‘privacy-preserving data sharing’ and blockchain.’

Research on health factors

During the seminar, speakers from Germany (Gordan Pipa, professor at the university of Osnabrück) and the United Kingdom (Sofia Olhede, professor at University College London) presented their research. Keynote speaker professor Michel Dumontier, professor of Data Science at Maastricht University (UM) and a former researcher at Stanford University, has a great deal of research expertise in the field of data science. Together with a number of external parties including CBS, he endeavours to collate privacy-sensitive medical data - now stored at different organisations such as hospitals and insurance companies - in a responsible way. Dumontier is in charge of the research, provides expertise in machine learning and manages the data, with the aim of making the sorted data sets accessible based on several guiding principles. For this project, funded by the Dutch National Research Agenda, CBS supplies the socio-economic data and contributes to the development of the required infrastructure. These data are used to unravel the factors that influence health. ‘We hope this will give us insights into the connection between diabetes, lifestyle, socio-economic factors and the use of health care,’ says Dumontier.

Visualisations for analysis purposes

Part of the seminar was devoted to visualisations. Presentations on this topic were given by CBS expert Martijn Tennekes as well as Prof. Jack van Wijk from Eindhoven University of Technology and Jan Aerts, Professor in bioinformatics at the Catholic University of Leuven in Belgium. Many issues in this particular field as well as in general Big Data research are quite complex. In contrast to traditional statistics, the data collection is often unknown and complicated and the possible solutions are not fixed. Data visualisation plays an essential role in tackling such issues. Complex data files often require more complex visualisation methods that demand expertise in data visualisation as well as domain knowledge. Jack van Wijk spoke about visual analytics, a discipline within the data visualisation realm which is concerned with supporting automated analysis methods (from traditional statistics or machine learning) by using interactive visualisation. This combination of man and computer is very powerful but challenging.
The next CBDS seminar will be held on Thursday, 14 March 2019 and will carry the theme ‘data-driven governance’.

What is the Center for Big Data Statistics?
The amount of data collected automatically are increasing exponentially. With the Center for Big Data Statistics (CBDS), CBS is investigating the opportunities of these new data for statistics while developing the necessary methodology in a unique, innovative environment. The CBDS does this in collaboration with national and international parties from the public and private sector as well as from science and education.