Big data sources for official statistics

A dynamic factor model to incorporate Big data sources in state-space models for official statistics.
In this paper, a time series model is developed to estimate monthly unemployment figures using auxiliary series based on related search terms on the Internet, obtained via Google trends on a weekly basis and monthly figures on claimant counts. In a first step, a limited number of common factors (using PCA) are estimated from a large number of auxiliary series from Google trends. Subsequently, monthly figures on the unemployed labour force observed with the Labour Force Survey (LFS) are combined with the Google trends and the time series of claimant counts in a dynamic factor model. The model is fitted with the Kalman filter. It is investigated to what extent these auxiliary series improve the accuracy of the estimates for the total unemployed labour force based on the LFS. It also examined to which extent accurate initial estimates can be made when Google trends become available, but estimates based on the EEW are still missing (nowcasting).