Statistical data confidentiality high on the agenda

/ Author: Miriam van der Sangen
UNECE congress at Statistics Netherlands, The Hague
© Sjoerd van der Hucht Fotografie
From 29-31 October 2019, Statistics Netherlands (CBS) The Hague hosted a ‘Work Session on Statistical Data Confidentiality’. The work session had an intensive programme with many lectures and discussions in the field of statistical data protection. As the volume of data increases, so do privacy risks.

The protection of statistical data is an important and topical subject of concern on an international level. A workshop on this subject is organised once every two years at the request of UNECE - the United Nations Economic Commission for Europe. This year’s three-day event was hosted by CBS, with Eric Schulte Nordholt and Peter-Paul de Wolf as the driving forces. The agenda revolved around four main topics: microdata access facilities, output checks in remote access work, tabular data protection and software tools for statistical data confidentiality in the 2020/2021 population census. The event drew over 80 visitors from all corners of the world and featured around 40 different presentations.

Identifying trends

According to Taeke Gjaltema – who was closely involved on behalf of UNECE – the event has produced some great and concrete results. ‘During a Work Session, trends are identified and statistical organisations jointly implement measures in the field of data confidentiality. As we accumulate ever larger volumes of data and new data sources – e.g. big data and administrative registers – there are also elevated risks of disclosure of personal and company details as well as privacy-sensitive information. We receive more and more information on the geographic location of individuals, and this raises new risks. Reinoud Stoel, who heads the methodology team of CBS The Hague: ‘The changes taking place in the volume and the types of data for the production of statistics are precisely why methods in the field of data protection are in constant development. This recurring international work session is an ideal opportunity to exchange knowledge and experiences and to look ahead at challenges we may be facing in the future together.’

Microdata access

Microdata are data at the level of individual persons, companies and addresses, which may be interlinked. This makes it possible for external parties to conduct their own statistical research, provided they meet a set of stringent requirements. Originally, Denmark and the Netherlands were the only two NSIs providing access to microdata for scientific research. Schulte Nordholt explains: ‘This is now also possible at other statistical agencies. We see they are facing the same issues as we were at the beginning. Who is allowed access to the microdata, under which conditions, what terms of contract, how do we monitor the results, etcetera? We discussed all these questions during the Work Session.’ This also involves monitoring of the results from any scientific research which is carried out using microdata. ‘More and more researchers avail of this facility’, says De Wolf. ‘This means statistical agencies have to perform more checks on research results. This can be done on the basis of random samples, or you could automate some of the checks. Furthermore, those who perform the checks should be trained accordingly. During the work session, we observed major differences in practices between the various countries around Europe.’

Dataset protection

Another important topic at the work session was the protection of tabular data. ‘All sorts of innovations in the field of tabular data publications were reviewed here, ’ says De Wolf. ‘There are many tools available to publish such data, but the issues are becoming more and more complex. There is an ever higher risk of disclosure of sensitive data.’

2021 Census

A great deal of focus was on the latest software developments in the field of data confidentiality towards the 2021 EU-wide population census. Schulte Nordholt explains: ‘National censuses will be conducted all across Europe in 2021. Our main goal in reviewing this topic was to reach a consensus as to how we deal with the privacy issues. Not only Eurostat (the EU’s statistical body, tr.) but also national authorities want more and more details to be included in the census results. However, the traditional security methods applied to datasets result in too much loss of information for the users. By making slight adjustments in the cell values, the conclusions from such research remain basically the same, but it becomes much harder to retrieve details at the individual level. This is why several statistical agencies - including CBS - are conducting studies that will lead to the addition of so-called random noise.’

Collaboration

Gjaltema is enthusiastic about the fact that so many innovative approaches were shared during the Work Session. ‘Both by the statistical agencies and by academic and other research institutions. We have also reached a number of collaborative agreements. In addition, the participants have shown interest in for example creating a road map for microdata access facilities.’ According to Schulte Nordholt and De Wolf, it was a successful event, not only in content but also in terms of the organisation. ‘An event for 80 participants requires a great deal of planning – from entry visas to dinners – but everything went well. We received a lot of compliments from the participants and from Eurostat and UNECE, and we wish to thank everyone who contributed towards the success of this event.’

The High-Level Group for the Modernisation of Official Statistics (HLG-MOS) recently held a conference in Geneva which followed up on the above Work Session with the selection of a project for 2020. This project will focus on the input of privacy-preserving techniques using work packages on Secure Multiparty Computation and Homomorphic Encryption methods.