Quick start guide

The OData API of Statistics Netherlands provides users access to StatLine data in a machine-readable format. In this Quick start guide the use of this API is demonstrated by downloading key figures about Dutch neighbourhoods from table 83765NED. This tutorial is for R- and Python users and has a few prerequisites: users should know how to install packages and be able to work with functions and variables.

The code examples can be copied easily to the clipboard by clicking the "Copy" button in the code block. The Dutch versions of these examples have been combined into a GitHub repository.

The package cbsodataR is available for users of R. This package can be installed from CRAN by running install.packages("cbsodataR") once. The function cbs_get_toc() lists all available tables and cbs_get_toc(Language = "en") lists all English tables.

After choosing a table applying cbs_get_data() to the table identifier retrieves the table from the API. The metadata, which contains information about content and structure of the data, can be downloaded with cbs_get_meta().

# Run once:
# install.packages("cbsodataR")

library(cbsodataR)

# Downloading table list
toc <- cbs_get_toc()
head(toc)

# Downloading entire dataset (can take up to 30s)
data <- cbs_get_data("83765NED")
head(data)

# Downloading metadata
metadata <- cbs_get_meta("83765NED")
head(metadata)

When working with large datasets it is desirable to retrieve a subset of the data instead of the entire dataset. The optional parameter select of cbs_get_data() specifies one or more columns to be retrieved. Rows can be filtered using one or more optional arguments of the form <dimension> = <value>. The dimension names and their values are found in the metadata. These optional filter arguments have to be supplied directly after the table identifier. In the following example the number of inhabitants of the municipality of Amsterdam is retrieved.

# Downloading a subset
data <- cbs_get_data("83765NED", 
             WijkenEnBuurten = "GM0363    ",
             select = c("WijkenEnBuurten", "AantalInwoners_5"))
head(data)

The data frame data consists of a single row and two columns.

The package cbsodata is available for Python users. This package can be installed with pip by running pip install --user cbsodata in a command prompt. The package is compatible with Pandas, a library used for data analysis. The function get_table_list() lists all available tables.

After choosing a table applying get_data() to the table identifier retrieves the table from the API. The metadata, which contains information about content and structure of the data, can be downloaded with get_meta().

import pandas as pd
import cbsodata

# Downloading table list
toc = pd.DataFrame(cbsodata.get_table_list())

# Downloading entire dataset (can take up to 30s)
data = pd.DataFrame(cbsodata.get_data('83765NED'))
print(data.head())

# Downloading metadata
metadata = pd.DataFrame(cbsodata.get_meta('83765NED', 'DataProperties'))
print(metadata[['Key','Title']])

When working with large datasets it is desirable to retrieve a subset of the data instead of the entire dataset. The optional parameter select of get_data() specifies one or more columns to be retrieved. Rows can be filtered by adding the parameter filter of the form "<dimension> eq <value>" . The dimension names and their values are found in the metadata of tables. In the following example the number of inhabitants of the municipality of Amsterdam is retrieved.

# Downloading a subset
data = pd.DataFrame(
        cbsodata.get_data('83765NED', 
                          filters="WijkenEnBuurten eq 'GM0363    '",
                          select=['WijkenEnBuurten','AantalInwoners_5']))
print(data.head())

The data frame data consists of a single row and two columns.

Choose a programming language: