CBS’ innovation efforts are aimed at contributing to the strategic objectives of CBS as described in the Multi-annual programme.
To ensure the continued quality of statistics production, CBS invests in innovating processes used in its statistics production. For both economic and social statistics, large-scale innovation projects are under way with the aim of standardising the production processes and making them more efficient. Aside from updating these processes, the publication of statistics using the current StatLine tool is being renewed. There are also various initiatives aimed at improving and future-proofing data collection by means of surveys. Specifically in data collection among enterprises, CBS is targeting the minimisation of the administrative burden and adjusting data collection processes to simplify the data submission by enterprises.
Aside from the efforts in streamlining production processes, CBS is also improving its statistics in substantive terms. In addition, new statistical products are being developed to meet the societal challenges stated in the Multi-annual programme. CBS is thus continuously improving its statistical services.
In many cases, new statistical products can be realised using already available data and methods, but in certain cases, innovative methods and techniques need to be developed to enable the realisation of new statistical products. CBS has a methodological research programme for development of such methods. Innovation efforts are currently focused on the following topics:
Information from text (text mining)Text mining is a technique for extracting valuable information from large amounts of text. CBS is developing algorithms to extract information from text by using machine learning to automatically recognise the correct patterns in text. Text mining can be applied in various ways. For instance, CBS is investigating the possibility of using this method to deduce the characteristics of enterprises from the information they publish on their websites. This technique is also applied to process information from corporate annual reports. One recent application is focused on the labour market, with CBS collaborating in a consortium to automatically identify and classify skills from various sources such as online vacancies.
Use of apps and sensors (smart surveys)CBS uses large amounts of data from registers, but in some cases still needs to conduct surveys. To minimise the response burden on individual people and enterprises, CBS conducts as few surveys as possible. That is why it is constantly exploring possible innovations in that area. In 2022, the focus was on so-called smart surveys, involving the use of smartphone apps and sensor measurements. This allows respondents, after having given their consent, to automatically provide CBS with data they would normally have provided by completing a survey.
Privacy enhancing / preserving techniques (PPT / PET)Privacy enhancing (or preserving) techniques refer to a set of techniques that allow analysis of sensitive data without being able to access the underlying microdata. Different techniques often have a specific function, so a specific technique is used depending on the application. In cooperation with universities and market participants, CBS is investigating the added value of these techniques. In cooperation with Maastricht University, among others, the deployment of federated learning or distributed learning is being explored. This enables an algorithm to travel along different data sources without the need to share the data. In collaboration with market participants supplying these techniques, applications with multiparty computation (MPC) are being studied as well. With MPC, the various data sources are first encrypted and disaggregated before an analysis is conducted.
Synthetic dataSynthetic data is seen as a possible solution for sharing privacy-sensitive data. In this process, the original data is replaced by synthetic data that has the same statistical properties for certain applications. The possibilities for using synthetic data are being studied in collaboration with various parties. Applications range from generating synthetic data for education purposes to system testing. At CBS, synthetic data is considered to be data generated from computer simulations or algorithms in which the analytical value reflecting the real world is maintained, but the risk of disclosure is as low as possible. Synthetic data differs from traditionally secured microdata files in the sense that characteristics of population units are mimicked, so that especially in fully synthesised data, the resulting units do not correspond to the real units, but the statistical information is preserved at the detailed level. With traditionally secured microdata, particularly at a detailed level, a lot of information might be lost. It should be noted that synthetic data need not necessarily be microdata: aggregated data may also be generated synthetically.