Pitfalls and safety nets in publishing open data
Increasing amounts of government data are published online as open data. Sharing and re-using the information is one way for the government to increase transparency. But public authorities are struggling with the complexities of privacy issues related to these data being published. This was a reason for Statistics Netherlands (CBS), experts in the field of protecting personal data, to host the first-ever practical course in September 2015.
One important question which arises when publishing open data is: up to what level of detail can information be published without disclosing privacy-sensitive data on individuals or companies? CBS is an international leader in the development of related methodology and software. During the course, the two instructors Eric Schulte Nordholt and Peter-Paul de Wolf shared their knowledge in the field of open data with various levels of Dutch government. Organising host Tanya Gelsema: ‘The broad interest in this subject became clear fairly quickly from the number of registrations. The first course filled up immediately and we started a waiting list, so we could schedule a second course.’
The main objective for the instructors was to raise awareness. Gelsema: ‘Through many different examples, they tried to show the various pitfalls in publishing information, but also the safety nets which are available.’ Also covered were the various legal frameworks: the course participants are bound by the Personal Data Protection Act, for example, while CBS is bound by the Statistics Netherlands Act as well. The latter act imposes restrictions on what CBS can publish due to privacy issues, while at the same time providing greater clarity on which data to protect.
Case studies from past practice
During the evaluation at the end of the afternoon, it became evident that the course objective had been more than achieved. The participants considered it a valuable introduction to this topic. Gelsema: ‘They became more aware of the ‘classical’ problems involved in publishing information as well as possible solutions. Some were surprised at first to learn that data published at an aggregated level could also reveal information on individuals. A simple solution can do the trick: adjust the table format by merging specific categories of information.’ There was ample reference to case studies from actual CBS practice. Furthermore it emerged that a need exists for an advanced practical course on working with the open source software which has been developed partly by CBS. There are plans to organise such a course in the near future.