CBS’ statutory task is to compile and publish statistics for which there is a need. It wants to remain at the forefront in this respect, now and in the future.
Research is necessary in order to continuously ensure high-quality statistics.
Research is conducted on seven topics:
New data collection techniques
Innovation in the area of primary data collection; for example, smart surveys. CBS uses large amounts of data from registers, but sometimes still needs to conduct surveys. To avoid overburdening people and enterprises, CBS conducts as few surveys as possible. That is why CBS is constantly exploring innovation in that area, in particular so-called smart surveys that use apps on smartphones and sensor measurements. This allows respondents, after having given their consent, to automatically provide CBS with data they would normally have provided by completing a survey.
Big data, data mining & Artificial Intelligence (AI)
New data sources are often not structured and may consist of imagery (satellite images), text (text mining) or natural language instead of figures. This requires special techniques to extract and analyse the information from those data streams. Examples of techniques include text mining and natural language processing (intelligently and automatically extracting information from voluminous and unstructured textual data). These also include machine learning and (interpretable) artificial intelligence.
This innovation is designed to aggregate and manage all data flows received by CBS. The aim is to produce coherent, high-quality estimates. For each unit of the populations on which CBS publishes information, all available data should be instantly retrievable.
Data protection requires constant improvement, because increasingly more information is available as open data and more computing power is becoming widely accessible as well. Cooperation and synergy with external parties such as universities and other social organisations requires the guarantee that the data can be securely collected and exchanged, and that the data are published while respecting disclosure risks (privacy preserving data sharing & analytics).
It remains necessary to develop valid statistical models that provide outcomes without bias and with minimal uncertainty to meet the demand for geographically or demographically finer-grained data (additional statistical services).
Society consists of myriads of people and very diverse actors, as well as the relationships and interactions between them. Policymakers more often request CBS to map mechanisms and relationships rather than just populations of people or enterprises. To also meet this demand in future, CBS needs to apply the theories and analysis techniques of complexity science to shed light on causal relationships in social and economic phenomena in society.
Data querying & processing
All source material from very diverse origins must be combined quickly, robustly and stably into high-quality statistics, whereby that quality should also be quantifiable. The combination of volume increases and the demands on the speed of processing also mean that ever-increasing demands are being made on hardware and software.
How does CBS do this?
Research can be seen as a process in which fundamental methods and algorithms are developed and validated first. In the next step, software packages and associated documentation are developed in collaboration with senior researchers from across CBS, so that state-of-the-art methods can be applied directly in primary processes.
The proper use of new methods and techniques requires not only software but also a sound knowledge development of all CBS employees. As evident from the many courses offered by the CBS Academy, methodology plays a major role in sharing and disseminating expertise so that new methods are also used in the best possible way.
What applications are used?
We will highlight a few of them:
- In both business and people surveys, sampling can be dynamically adjusted to ensure that even limited responses are as representative as possible of the entire population. The influence of residual bias is minimised through the deployment of metadata and paradata and the hybrid use of register and survey data.
- The basis for measurement of inflation in the Netherlands, the CPI, is derived directly from websites using Big Data techniques such as web scraping and text mining, eliminating the need for direct data collection in retail shops. This is just one application of these techniques that are now being used on a much wider scale.