KNAW: improve the reuse of public data for science

/ Author: Masja de Ree
© Wim van der Spiegel
Data from the administrations of central and local government, health insurers and other organisations performing public and semi-public tasks contain information that can be of great value to scientific research. The Royal Netherlands Academy of Arts and Sciences (KNAW) has studied ways to improve the accessibility of these so-called public data. Pearl Dykstra, chair of the Academy’s advisory committee and Professor of Empirical Sociology at Erasmus University Rotterdam, talks about the measures that can promote reuse and the part played by Statistics Netherlands (CBS).

Reuse of public data

Last June, the Academy published its advisory report ‘Reuse of public data. More academic research and better government policy’, which was compiled at the request of the Ministry of Education, Culture and Science. ‘The ministry wanted to know to what extent researchers encounter obstacles when they want to reuse public data,’ Professor Pearl Dykstra explains. One of the Academy’s findings is that government organisations are mainly focused on businesses and the general public when making the data they collect available. The academic world is not yet on their radar.

Source data

The government’s ambition is to make as many data as possible from government administrative bodies available as ‘open data’. For scientific researchers, however, having access to what are essentially unprocessed source data is crucial. For them ‘open data’ are of relatively little use since these often appear in the form of group averages, or in the form of textual data which do not readily lend themselves to numeric calculations. As Professor Dykstra notes, ‘Open data is a veiled term. It implies that the data have to be freely accessible at no cost. Yet data cannot simply be made freely available with a view to privacy, while making data available to others costs money.’ When making source data accessible, it stands to reason that strict privacy protection should be taken into account, but there are yet more aspects to consider. ‘In practice there are a variety of obstacles,’ Professor Dykstra continues. ‘Technical difficulties, for example, due to the fact that different municipalities use different software systems. But we also encounter a reluctance to share data.’

Data landscape

CBS makes all the data that can be found in its StatLine database available as open data. But the role played by CBS does not stop there. ‘In our report, we describe the special position that CBS occupies in the Netherlands’ data landscape,’ Professor Dykstra explains. ‘CBS manages data from over 200 government organisations and acts as a centre of expertise for analysing big data. The great added value of CBS is that data from national registers can be linked in a way that safeguards privacy, creating a very rich seam of information. CBS has the legal mandate, the knowledge and the technical infrastructure for that task. Researchers at authorised institutions can use CBS data for scientific research under strict conditions. But looking beyond primary government data, the data generated by private parties performing public sector tasks are often lacking. CBS and the Ministry of Economic Affairs and Climate Policy are currently exploring how the legal framework can be adjusted to establish a sounder legal basis for private parties to supply CBS with the data they collect. The Academy strongly advises CBS to ensure greater accessibility and reusability of public data with a view to facilitating research.’

‘We call it “data hugging”: organisations preferring to keep data to themselves to gain a competitive edge in terms of know-how or earning potential.’

Data hubs

Professor Dykstra believes that CBS is especially well suited to act as a data hub for all personally identifiable data. ‘CBS has a wealth of knowledge when it comes to securing these data in such a way that disclosure is not possible. Other types of data, for example on air quality, can also be housed in a separate data hub. Public data should be made available through multiple data hubs which together form a comprehensive infrastructure.’

Data authority

A key element in the Academy’s recommendations to the Ministry of Education, Culture and Science is that a Chief Public Data Officer (CPDO) should be established: a single figure of authority who works closely with government organisations to figure out ways of promoting the reuse of data. ‘The position of CPDO should be vested at senior level,’ Professor Dykstra advises, ‘for example with the Minister or State Secretary of the Interior and Kingdom Relations. Their ministry already bears responsibility for coordinating government organisations that supply data.’ Researchers who find themselves unable to proceed will then be able to turn to the CPDO. The CPDO will initiate, encourage and monitor whether government organisations are achieving progress when it comes to improving the availability of data.

Data hugging

While you can make solid agreements about data availability and reuse, it takes willpower and the right mindset to make it happen. How do you get government organisations to the point where they are ready to speed up the process of making the data they collect more accessible? ‘We call it “data hugging”,’ Professor Dykstra says. ‘Organisations prefer to keep data to themselves, thinking it can give them a competitive edge in terms of know-how or earning potential. So it’s also up to researchers to demonstrate the benefits of data accessibility to government organisations: for example, by providing insight into the results of pursuing a particular policy. In our advisory report, we encourage researchers to take the lead in this respect. We invite them not only to tell government organisations what they need, but also to report the results their research has produced and how important the provision of data has been.’

Result

Does Professor Dykstra expect the report to produce results? ‘Definitely! If you look at the recommendations the Academy put forward a few years ago, you can see how quickly they have taken effect. In five years’ time, I expect that we will have a CPDO, that more public data will be reused and that they will be made available according to the FAIR principle: findable, accessible, interoperable and reusable.’