Time series

Icon graph

A time series consists of measurements of the same variable at different time points. Most of the tables from Statistics Netherlands are revised monthly, quarterly or yearly. This tutorial explains how to process and visualise time series using data about tourist accommodation tax ("toeristenbelasting") as an example. The prerequisites for this tutorial are covered in the Quick start guide.

The code examples can be copied easily to the clipboard by clicking the "Copy" button in the code block. The Dutch versions of these examples have been combined into a GitHub repository.

Pick a programming language:

Table 84120NED contains data about the government tax revenue in the Netherlands. The dataset contains various taxes; the tax code corresponding to tourist tax can be found in the metadata.

library(tidyverse)
library(cbsodataR)

# Download metadata and search for "Toerist" in the list with tax codes
metadata <- cbs_get_meta("84120NED")
print(metadata$BelastingenEnWettelijkePremies %>%
        filter(str_detect(Title, "Toerist")))

The Key corresponding to tourist tax is A045081. To retrieve the subset of the table with information about this tax we add the optional parameter BelastingenEnWettelijkePremies = "A045081" to cbs_get_data().

# Download data about tourist tax
data <- cbs_get_data("84120NED", 
            BelastingenEnWettelijkePremies = "A045081") %>%
        select(-BelastingenEnWettelijkePremies)

The response from the API has a column Perioden which contains time periods in the format yyyyXXww, where yyyy are years, XX denotes whether these are numbers about a year (JJ), quarter (KW) or month (MM) and ww is the index number. This period can be transformed with cbs_add_date_column(), adding a column Perioden_freq with the type of period (month/quarter/year) and another column with the start date of the period. Annual figures get assigned the letter Y in Perioden_freq. Hence it is easy to filter for annual figures by applying filter(Perioden_freq == "Y").

# Add date column and filter for annual figures
data <- data %>%
  cbs_add_date_column() %>%
  filter(Perioden_freq == "Y") %>%
  select(-Perioden_freq)
A plot of the time series is now made by ggplot2.

# Plot the time series
ggplot(data, aes(x=Perioden_Date,
                y=OntvangenBelastingenEnWettPremies_1)) +
  geom_line() + 
  ylim(0,250)+
  labs(title = "Tourist tax revenue", x = "",
           y = "million euro")

Table 84120NED contains data about the government tax revenue in the Netherlands. The dataset contains various taxes; the tax code corresponding to tourist tax can be found in the metadata.

import pandas as pd
import cbsodata

# Download metadata and search for "Toerist" in the list with tax codes
metadata = pd.DataFrame(cbsodata.get_meta('84120NED', 'BelastingenEnWettelijkePremies'))
print(metadata[metadata['Title'].str.contains('Toerist')][['Key','Title']])

The Key corresponding to tourist tax is A045081. To retrieve the subset of the table with information about this tax we add the filter "BelastingenEnWettelijkePremies eq 'A045081'"" to get_data().

# Download data about tourist tax
data = pd.DataFrame(cbsodata.get_data('84120NED', filters = "BelastingenEnWettelijkePremies eq 'A045081'"))

Annual figures can be filtered with the regular expression ^\d{4}$, which checks whether a string consists of precisly four digits.

# Filter for annual figures
data = data[data['Perioden'].str.match("^\d{4}$")]
A plot of the time series is now made by matplotlib.

# Plot the time series
p = data.plot(x = 'Perioden',
              y = 'OntvangenBelastingenEnWettPremies_1',legend = False)
p.set_title('Tourist tax revenue')
p.set_ylim([0,250])
p.set_xlabel("")
p.set_ylabel("million euro")