CBS explores possible privacy preserving techniques with universities
/ Author: Miriam van der Sangen
Statistics Netherlands (CBS) is constantly looking for new methods and techniques to meet the steadily growing demand for statistical information. The principle underlying this search is that the privacy of all parties involved should be guaranteed at all times. Working closely with its external partners, CBS is therefore exploring the potential of privacy preserving techniques. In 2020, CBS entered into a collaborative project with Maastricht University (UM) and the University of Groningen (RUG). The aim of this collaboration is to create a generic data infrastructure that can be used to carry out scientific or statistical research. In this approach, the data remain with the party that supplies them.
Data remain secure with the data source owner
Privacy preserving techniques are new cryptographic techniques that enable privacy-sensitive data to be analysed without parties having access to the data themselves and without the data leaving the organisation where they are stored. ‘The common thread in the application of this technology is that the data remain secure with the source owners and are analysed remotely, while privacy continues to be guaranteed,’ Paul Grooten of CBS explains. In his role as Enterprise Data Architect, Grooten is the main representative of CBS in its collaboration with the two leading Dutch universities.
Deploying privacy preserving techniques has a number of major advantages. ‘The advantage is that the data remain with the primary source – the owner of the data – and only the information specifically needed for statistical research is extracted. This is consistent with the General Data Protection Regulation’s aim of keeping data-sharing to a minimum. Furthermore, we have the obligation to handle the information that individuals and businesses provide to CBS safely and securely.’ Deploying privacy preserving techniques can provide a solution in such cases and, as Grooten observes, there are other important reasons to make use of them. ‘The number of sources and the quantity of data are increasing rapidly. Collecting and securing all this data in one place is therefore becoming an ever more challenging task. Another advantage of analysing data at the source is that you are always making use of the very latest data.’
Scientific research institutions such as the universities in Maastricht and Groningen also find themselves collecting increasingly detailed information when carrying out their projects. This is certainly true of longitudinal studies such as Lifelines in Groningen and Maastricht’s CARRIER project, which tracks the long-term progress of cardiac disease with reference to a wide spectrum of medical indicators. To find answers to some of their questions, the researchers need to combine the data they collect with other data sets, for example those of CBS. This is why CBS has teamed up with the universities of Groningen and Maastricht to explore a new way of sharing data responsibly using privacy preserving techniques.
‘CBS data are mainly socio-economic in nature. The privacy of the individuals concerned is fully guaranteed’
‘The starting point for the collaboration between the universities and CBS is a white paper written by Dr Jaap-Henk Hoepman, a senior lecturer in IT Law at Groningen,’ Grooten explains. ‘This white paper details the purpose of the cooperation between CBS, Maastricht and Groningen. It also looks at what we want to achieve, why we want to achieve it and how we intend to go about it. The white paper also describes five different privacy preserving techniques. Examples include calculating with encrypted data and sharing pseudonymised data.’
The aim of the cooperation with Maastricht and Groningen is to develop innovative approaches – technical organisational, legal and otherwise – to unlock, link and aggregate data from different sources to obtain information that cannot be accessed in any other way. Leen Roosendaal, Director of Policy and Management Support at CBS says, ‘The guiding principle here is that the source data remain with the supplying party. In addition, the data used must answer the research question and comply with the privacy legislation, as included in the Statistics Netherlands Act and the General Data Protection Regulation.’
The CARRIER project is one example of a PPT pilot between CBS and Maastricht University. Researchers from the university itself and Maastricht University Medical Center have developed a self-learning eHealth application for the prevention of coronary heart disease. The digital application uses large quantities of data from hospitals, general practitioners and CBS, all of which are accessed using PPT. Roosendaal continues, ‘CBS data are mainly socio-economic in nature. For example, data from patients with a history of coronary heart disease can be combined with data on lifestyle trends among the population and a person’s physical environment. In doing so, the privacy of the individuals concerned is fully guaranteed.’
Based on the combined data outlined above, the CARRIER researchers can come up with risk estimates for the development of coronary heart disease. This knowledge is integrated into a predictive model, which of course takes into account the transparency and ethical aspects that apply to predictive models. The treating doctor can then use this knowledge to accurately assess who is at risk based on his or her individual characteristics. This assessment can then serve as a basis for finding starting points for intervention. ‘It’s a great example of how the application of privacy preserving techniques can help society move forward,’ Grooten concludes.