Statistics Netherlands publishes most of its figures in tables. These tables can be accessed through StatLine at the CBS website. In order to make these figures more readily available, tables are published as open data as well. CBS has been publishing open data starting July 2014.
Since the beginning of 2018 CBS has been preparing a new release of open data. This manual is meant to offer a first explanation of the new release, which will at first be made available as a beta product.
The existing version of opendata.cbs.nl offers only one catalog (CBS), usually referred to as StatLine. In the new version of open data a second catalog will be available, called CBS-Maatwerk (CBS customized services). This dataset will offer datasets that are the result of CBS research commissioned by third parties.
This beta release of open data offers at the moment 12 datasets in the catalog CBS and one dataset in the catalog CBS-Maatwerk.
The new open data services are based on the oData protocol, version 4. Information about the oData4 protocol can be found here.
The new open data services makes use of the profile DCAT-AP-NL versie 1.1. This means that the labels of the various fields follow the example of the DCAT model. More information about the DCAT-AP-NL profile can be found here.
Overview of services
In this service you will find an overview of the available catalogs, including some characteristics of the catalog. Based upon the identifier, <CBS> or <CBS-maatwerk> you can build the link to the catalog.
This service offers an overview of available datasets in the CBS catalog. At the moment it is limited to 12 datasets in one catalog.
This service offers an overview of available datasets. At the moment it is limited to only one dataset in one catalog.
In regard to the DCAT-AP-NL profile CBS had added three extra fields. The field ObservationCount shows how many cells the dataset contains. This gives an indication of the size of the dataset. The field ObservationsModified indicates when the observations are updated. The field DatasetType shows the attribute of the cells in the dataset. The following values are allowed:
• Numeric: all cells are of type numeric. This makes it easy to handle the cells in Business Intelligence tools, such as Power BI or Excel.
• String: all cells are of type String. In general these are datasets which contain information to support further analysis.
• Hybrid: cells can contain either numeric or string values. To further process these data more manipulations will be needed. Hybrid cells will generally be found in old datasets.
Based on the identifier of the dataset it is possible to construct the odata link to the dataset.
This is the basic link to dataset 900001NED. “OData4” defines the version of open data services. “CBS-Maatwerk” refers to the catalog which contains the dataset.
For every dataset the following subsets can be found (not all subsets are mandatory):
- MeasureGroups (optional). This subset contains the hierarchical information of the Measures. It will be empty if there is no hierarchy, that is, if the measures are just a list of topics without a structure of maps. In the future empty subsets will be omitted from the service. This service contains the field ParentID, which refers to the ID of the measure group of which it is a part.
- MeasureCodes (mandatory). This subset contains information about the topics: identifier, title, description, measuregroupID, datatype, unit, format, number of decimals, and information about the presentation type for maps. The measuregroupID refers to the hierarchical information in the subset MeasureGroups. If there is no hierarchy this field will be empty.
- Dimensions (mandatory). This subset shows an overview of the dimensions of the dataset, with their identifier, title, description and type. For each dimension there exist two subsets, as follows:
- <name>Codes (optional). This shows an overview for the dimension <name> of identifiers, title, description and some other charachteristics. With the 6 character postal code it is possible that the code list is empty, because this dimension is descriptive of itself.
- <name>Groups (optional). This gives information about the hierarchy of the dimension. It is comparable to MeasureGroups. This list too can be empty, if there is no hierarchy.
- Properties (mandatory). This subset contains the meta data for the dataset. It is much more comprehensive than the meta data offered through the service Catalogs. The fields are based on the DCAT-AP-NL 1.1 profile. This subset contains only one record (singleton).
Observations (mandatory). This subset contains the cells of the dataset. It shows the following fields:
• Measure (refers to the identifier in MeasureCodes)
• ValueAttribute (special attribute of a cell, none means there is no special attribute)
• Value, the numeric or string value of a cell
• <name(1).identifier> refers to the code list of dimension <name(1)> or is selfdescriptive.
• <name(n).identifier> refers to the code list of dimension <name(n)> or is selfdescriptive.
The number of dimensions is at least one. It can vary with each individual dataset, so the fields of the subset Observations are different for each dataset.
Differences between OData 3 and OData 4
The main differences between the oData3 services and the new oData4 services are:
The information in the DataProperties of Odata 3 has been split up in three subsets:
This makes it possible to directly link this information in BI tools to the values (Observations). There is no loss of information compared to oData3, only another way of organization. In oData3 linking this information was difficult for BI-tools.
- The information of the TypedDataSet en UntypedDataSet in oData3 has been transformed to Observations (changing data transmission from record to cell). The attribute of the cell has been included as ValueAttribute. For 6 character postal code there are only standard values. In the next version of the manual this difference will be further explained.
- In oData version 3 three formats are offered, JSON, XML and Atom-feed. Odata version 4 only offers the JSON format. Because CBS follows the official standard the other formats have not yet been implemented.
- The information in CategoryGroups of oData3 contained the hierarchy information for all dimensions. In oData4 the hierarchy is provided for each dimension separately.
- For dimensions of type TimeDimension, GeoDimension and GeoDetail the hierarchy information has been added.
- The oData4 release based on oData 4 is based on the naming conventions of the DCAT-AP-NL 1.1 profile. This applies to the subsets <Catalogs>, <DataSets> and <Properties>.