Cartography

Icon outline of the Netherlands

Many datasets contain regional statistics. This tutorial shows how to create a thematic map in R or Python by linking spatial data to data about birth rates from Statistics Netherlands. The prerequisites for this tutorial are covered in the Quick start guide.

Statistics Netherlands publishes spatial data via PDOK (Publieke Dienstverlening Op de Kaart). The dataset "CBS Gebiedsindelingen" contains the most-used regional boundaries. This geospatial data can be downloaded in several formats including Shapefile and GeoJSON, but it is also possible to retrieve geodata automatically via the API. In this tutorial the API is used to ensure all maps are up to date. More information about this API can be found in its documentation. Please be aware that regional boundaries change regularly and make sure to choose the right geodata.

The code examples can be copied easily to the clipboard by clicking the "Copy" button in the code block. The Dutch versions of these examples have been combined into a GitHub repository.

Keuzemenu programmeertaal:

This example uses sf for working with geographical data. The tidyverse library is a collection of R packages designed for data science and is used for convenience.

The names of the columns with data about births can be found in the metadata of the table with key demographic figures.

library(cbsodataR)
library(tidyverse)
library(sf)

# Find out which columns are available
metadata <- cbs_get_meta("83765NED")
print(metadata$DataProperties$Key)

The table consists of 147 columns and birth data can be found in GeboorteTotaal_24 and GeboorteRelatief_25. It's common in thematic maps to use ratios instead of absolute numbers so that regions of different sizes can be compared. The selected data can be retrieved with cbs_get_data(), where the desired columns can be picked with select.

# Download birth rates and delete spaces from regional identifiers
data <- cbs_get_data("83765NED", 
                     select=c("WijkenEnBuurten","GeboorteRelatief_25")) %>%
  mutate(WijkenEnBuurten = str_trim(WijkenEnBuurten),
         births = GeboorteRelatief_25)

The geodata is retrieved via the API of the Dutch National Georegistry of PDOK and read with st_read() from sf.

# Retrieve data with municipal boundaries from PDOK
municipalBoundaries <- st_read("https://geodata.nationaalgeoregister.nl/cbsgebiedsindelingen/wfs?request=GetFeature&service=WFS&version=2.0.0&typeName=cbs_gemeente_2017_gegeneraliseerd&outputFormat=json")

The birth rates can now be combined with the boundaries with left_join().

# Link data from Statistics Netherlands to geodata
data <- 
  municipalBoundaries %>%
  left_join(data, by=c(statcode="WijkenEnBuurten"))

The thematic map can be drawn with ggplot2 from tidyverse.

# Create a thematic map
data %>%
  ggplot() +
  geom_sf(aes(fill = births)) +
  scale_fill_viridis_c() +
  labs(title = "Levend geborenen per 1000 inwoners, 2017", fill = "") +
  theme_void()

Thematic map of birth rates in the Netherlands

This example uses geopandas for working with geographical data. The pandas library contains standard tools for data science and is used for convenience.

The names of the columns with data about births can be found in the metadata of the table with key demographic figures.

import pandas as pd
import geopandas as gpd
import cbsodata

# Find out which columns are available
metadata = pd.DataFrame(cbsodata.get_meta('83765NED', 'DataProperties'))

The table consists of 147 columns and birth data can be found in GeboorteTotaal_24 and GeboorteRelatief_25. It's common in thematic maps to use ratios instead of absolute numbers so that regions of different sizes can be compared. The selected data can be retrieved with cbs_get_data(), where the desired columns can be picked with select.

# Download birth rates and delete spaces from regional identifiers
data = pd.DataFrame(cbsodata.get_data('83765NED', select = ['WijkenEnBuurten', 'Codering_3', 'GeboorteRelatief_25']))
data['Codering_3'] = data['Codering_3'].str.strip()

The geodata is retrieved via the API of the Dutch National Georegistry of PDOK and read with read_file() from geopandas.

# Retrieve data with municipal boundaries from PDOK
geodata_url = 'https://geodata.nationaalgeoregister.nl/cbsgebiedsindelingen/wfs?request=GetFeature&service=WFS&version=2.0.0&typeName=cbs_gemeente_2017_gegeneraliseerd&outputFormat=json'
municipal_boundaries = gpd.read_file(geodata_url)

The birth rates can be combined with the boundaries with merge().

# Link data from Statistics Netherlands to geodata
municipal_boundaries = pd.merge(municipal_boundaries, data,
                               left_on = "statcode", 
                               right_on = "Codering_3")

The thematic map can now be drawn with plot().

# Create a thematic map
p = municipal_boundaries.plot(column='GeboorteRelatief_25', 
                             figsize = (10,8))
p.axis('off')
p.set_title('Birth rate per 1,000 population, 2017')

Thematic map of birth rates in the Netherlands