Papers

Wetenschappelijke publicaties, proefschriften en verslagen van kwaliteitsonderzoeken op het gebied van statistiek.

Papers in deze reeks gaan over methoden, kwaliteit, processen en informatietechnologische en conceptuele onderwerpen en die relevant zijn voor het werkterrein van CBS. Ze beschrijven het resultaat van (toegepast) wetenschappelijk onderzoek dat onderzoekers van CBS (soms samen met anderen) hebben uitgevoerd. De reeks is niet bedoeld om CBS-cijfers te publiceren. De cijfers in de papers zijn dus niet als officiële CBS-uitkomsten aan te merken.

Papers

Dit artikel gaat over de methode die het CBS gebruikt om eenzaamheid te meten. Er worden twee verschillende meetinstrumenten met elkaar vergeleken.

Verantwoording en eerste resultaten van de per 2015 vernieuwde pensioenaansprakenstatistiek.

De rol van verschillende typen bedrijven (MKB, grootbedrijf) in de Nederlandse economie, 2012, methode en resultaten.

Methodological departments of national statistical institutes should adopt formal quality certification for ISO 9001.

We presenteren en testen een aanpak voor automatische foutlocalisatie gebaseerd op algemene aanpassingsacties.

National statistical institutes should adopt formal quality certification, e.g. ISO or EFQM.

Methodebeschrijving Prijsindex Nieuwbouw Koopwoningen

Een rapport met een statistische analyse van de relatie tussen gasdruk en gradienten in gasdruk binnen het Groningse reservoir voorafgaand aan aardbevingen, uitgevoerd in opdracht van het Staatstoezicht op de Mijnen.

Methodologische beschrijving en toepassing van dertien ecosysteemdienstmodellen in fysieke aanbod- en gebruiktabellen.

Een rapport met een analyse van de tijdsintervallen tussen aardbevingen in Groningen, gerelateerd aan de gaswinning.

Transitivity of price indices is advocated and various ways to construct these are shown.

Dit position paper geeft aan hoe het CBS de inzichten van de Complexiteitstheorie kan toepassen in officiële statistieken.

A small number of very big enterprises dominates R&D in the Netherlands.

Deze publicatie bevat indicatoren over inkomend toerisme op de eilanden Bonaire, Saba en Sint Eustatius in de periode 2015 - 2017.

Onderzoek onder bedrijven in Nederland die cacao(producten) verwerken naar in de afzet van duurzaam gecertificeerde cacao.

Onderzoek onder bedrijven in Nederland die cacao(producten) verwerken naar in de afzet van duurzaam gecertificeerde cacao.

Connecting correction methods for linkage error in capture-recapture.

Ontwikkeling van de rentelasten van WLZ zorginstellingen tussen 1992 en 2016

Nowcasten door middel van lineaire tijdreeksfilters

Investigating co-integraton between inflation indicators within the price dashboard using structural time series models.

Investigating correlation and temporal relationships between the inflation indicators within the price dashboard.

Financiële kengetallen van zorginstellingen over het jaar 2016

Een analyse naar het gebruik van mobiele apparaten, om CAWI vragenlijsten van het CBS te vullen.

Verkenning van alternatieven voor de CUMI-regeling in het speciaal onderwijs

Het onderzoek gaat in op definitieverschillen en verschillen in dataverzameling.

Dit paper bespreekt enkele onderwerpen rond het bemonsteren van netwerken met beperkte toegang.

Her-analyse frequentie van aardbevingen in Groningen i.v.m. verbeterde dagelijkse tijdsresolutie v/d reservoir gasdruk

Toepassingen van complexiteitstheorie in officiële statistiek.

Paper Adjustment of heating values and C02 petrol and diesel

Pilotstudie natuurlijk kapitaal rekeningen met aanbod en gebruik ecosysteemdiensten en waardebepaling

Components of Unemployment from a statistical perspective

Beschrijving van methode voor schatten onzekerheidsintervallen rond de uitkomsten van de PBL/CBS regionale prognose.

Naar een gevoeligheidsanalyse voor het binnenwerk van I/O tabellen

Huishoudens met een inkomen onder de lage-inkomensgrens in grensgemeente Vaals

Beschrijving van methode voor schatten onzekerheidsintervallen rond de uitkomsten van de PBL/CBS regionale prognose.

Hoe een dynamische artikelpopulatie te stratificeren ten behoeve van prijsindexberekeningen.

In omschrijvingen van gebak zitten allerlei kenmerken die nuttig zijn bij het classificeren van gebaksproducten.

Onderzoek naar de geschiktheid van web-gebaseerde tekstanalysemethoden om bedrijven te typeren.

Kerncijfers over de musea in Nederland in 2016

Innovation and Productivity of Dutch Firms: A Panel Data Analysis

Overstappen op, internetwaarneming bij CPI tijdreeksen van prijsindexcijfers

Vangst-hervangst methoden en schending van de aannames van perfecte koppeling en geen foutieve vangsten

becijfering van de bijdrage van de publieke exportkredietverzekering aan het bbp en de werkgelegenheid

Een Bayesiaans kader voorgelegd om de voorkennis en de deskundige opinie over enquête-designparameters op te nemen.

methoden om de effecten van veranderingen in een survey onderzoek te kwantificeren.

Financiële kengetallen van zorginstellingen over het jaar 2015

Multi-source Statistics: basic situations of combining sources and estimation methods

The GEKS method and several generalizations are discussed, including the cycle method

Transitivizing elementary price indices for internet data using the cycle method

Characteristics for the dynamic populations of articles

Elementary price indexes for internet data

Filtering in the Fourier domain: a new set of filters for seasonal adjustment of time series and its evaluation

Constistente multivariate seizoencorrectie, gebaseerd op een combinatie van een multivariaat structureel tijdreksmodel.

Prijsindexcijfer van de productie van gebouwen

Een rapport van voortgezet onderzoek naar de frequentie van aardbevingen in Groningen, gerelateerd aan de gaswinning.

Onderzoek naar responsgedrag en selectiviteit bij inzet van internet en papier bij de Veiligheidsmonitor

Fraude met online handel, Antwoorden uit de Veiligheidsmonitor vergeleken met het politieregister

Schatten van hoogst behaalde opleiding voor de Nederlandse virtuele volkstelling.

In dit paper stellen we een nieuwe imputatiemethode voor die rekening met controleregels houdt.

Haalbaarheidsstudie naar fysieke voorraden (urban mine) in onze economie

This discussion paper describes a method based on latent class modeling

This discussion paper describes a method based on latent class modeling

This discussion paper describes a method based on latent class modeling

Discussion Paper about model-based methods for the estimation of monthly unemployment by province

Measuring the uncertainty in accounting equations (estimated figures connected by known constraints) by scalar measures.

Big data and methodological challenges in official statistics

Financiële kengetallen van zorginstellingen over het jaar 2014

This article contains an appraisal of two recently proposed methods of inference for sequential mixed-mode surveys.

Discussion paper on the potential and implications of profiling big data sources for official statistics.

This is a methodological report on the extension of the Materials Monitor.

This study illustrates how time-series model-based techniques can improve the production of official statistical figures for repeatedly conducted surveys. Apart from reducing the variance of design-based estimates, time series models provide estimates on the magnitude of discontinuities caused by survey redesigns. In surveys with a subdivision into several domains, such models can be effectively applied in the context of small area estimation problems.

This study applies a range of predictive techniques to forecasting the sign of the next month’s change in five key economic indicators: consumer confidence, household consumption, producer confidence, manufacturing production and exports, all for the Netherlands. Techniques tested range from standard regression models to advanced machine learning techniques.

Datacorrectie is gebaat bij indicatoren die op eenvoudige en duidelijke wijze de invloed vanhet correctieproces op de data aangeven. In dit rapport worden enkele indicatoren beschrevendie op grafische wijze aspecten van verandering in data ten gevolge van een correctieprocesuitlichten. De indicatoren vallen uiteen in twee soorten: indicatoren die betrekking hebben op waardenen indicatoren die betrekking hebben op regelschendingen.

Conflicting information may arise in statistical micro data due to partial imputation, where one part of the imputed record consists of the observed values of the original record and the other of the imputed values. Edit rules that involve variables from both parts of the record will often be violated. One strategy to remedy this problem is to make adjustments to the imputations such that all constraints are simultaneously satisfied and the adjustments are, in some sense, as small as possible. The minimal adjustments are obtained by minimizing a chosen distance metric subject to the constraints and we show how different choices of the distance metric result in different adjustments to the imputed data. As an extension we also consider an approach that does not aim to minimize the adjustments but to make the adjustments as uniform as possible between variables. Under this approach, even the values that are not explicitly involved in any constraints can be adjusted. The properties and interpretations of the proposed methods are illustrated using empirical business-economic data.

This paper concentrates on methods for handling incompleteness caused by differences in units, variables and periods of the observed data set compared to the target one. Especially in economic statistics different unit types are used in different data sets.

Inventories play a crucial role in explaining business cycle turning points. Inventories contributed 0.7 percentage point to a 4 percent contraction of economic activity in 2009. In light of this, demand for inventory data has been growing since the financial crisis hit in late 2008. This paper analyses Dutch wholesale and manufacturing inventories and relates them to the business cycle.

Inventories are a useful statistic for tracking and analysing short-term economic developments. This paper by Floris van Ruth and Marcel van Velzen describes how the index of inventories of finished goods in the manufacturing industry can be used in business cycle analysis. Inventories themselves lag business cycle developments, and are therefore of limited use. Using the turnover index of the manufacturing industry to compute a ratio of inventory to sales (ISR) produces a new and leading business cycle indicator. The ISR is shown to consistently lead Dutch business cycle developments by one to two quarters. It is therefore one of the few real, i.e. non-financial and non-sentiment, leading indicators. Inventories are shown to exhibit clear co-movement with sales and the business cycle. The countercyclical development of the ISR is therefore explained by the fact that turnover reacts more strongly to business cycle developments than inventories.

Two univariate outlier detection methods are introduced. In both methods, the distribution of the bulk of observed data is approximated by regression of the observed values on their estimated QQ plot positions using a model cummulative distribution function.

Statistical agencies have to ensure that respondents’ private information cannot be revealed from the tables they release. A well-known protection method is cell suppression, where values that provide too much information are left out from the table to be published. In a first step, sensitive cell values are suppressed. This is called primary suppression. In a second step, other values are suppressed as well to exclude that primarily suppressed values can be re-calculated from the values published in the table. This second step is called secondary cell suppression

This contribution discusses possible scenarios and methodologies for the national statistical agencies for backcasting the new classification scheme (NACE Rev. 2.0) in existing time series of business statistics. We provide a discussion of the basic principles of reconstructing time series in general, after which the application of these methods in the area of short term business statistics is handled and illustrated with an example. We conclude that it is possible to obtain reasonable approximations of historic time series using rather simple methodology, but the quality of the backcasted time series is hampered by heterogeneity of classes.

Dit paper van Ralph Foorthuis en Sjaak Brinkkemper beschrijft verscheidene architecturen op project niveau, indien geconformeerd wordt aan Enterprise Architecture. Onder de beschreven architecturen bevinden zich Project Architecture, Project Start Architecture en Software Architecture. Zij worden geplaatst in de context van Enterprise Architecture en Domain Architecture.

The method of repeated weighting aims at obtaining numerical consistency among tables estimated from different surveys. However, in its common form, it does not take into account the existing edit rules. Consequently, the repeated weighting estimates will generally be not in agreement with existing edit rules. This report describes how to deal with linear categorical and numerical edit rules within the framework of repeated weighting estimation. A step-by-step plan is proposed of an estimation procedure yielding numerically consistent tables in agreement with edit rules.

This paper identifies a broad concept of unused labour force. This concept can be related to the transitional labour market theory. It is next reviewed in the light of economic studies using a similar construct. Pro’s and con’s of our approach are discussed and the literature is used to assess characteristics of groups building this concept and to identify transition from these groups to (more) employment. A first assessment of the size and some of the main characteristics of such groups is made using the Dutch Labour Force Survey of Statistics Netherlands. As far as transitions are concerned, international studies imply flows of persons from such different groups into labour will almost unavoidably have to be identified using linked longitudinal surveys among individuals. These are currently under construction at Statistics Netherlands.

PRAM (Post Randomization Method) is a disclosure control method for microdata, introduced in 1997. However, it has not yet been applied extensively by statistical agencies. This is partly due to the fact that, even though some theoretical results exist, little practical knowledge is available on its effect on disclosure control as well as on the loss of information it induces. We will try to make up for this lack of knowledge, by supplying some empirical information on the behaviour of PRAM, with respect to disclosure control and loss of information.

This paper first summarizes the discussion around the construction of a household satellite account and then continues to present the extension of the Dutch National Accounts with a time-use module. The module is developed to investigate the potential of data on paid and unpaid labour measured in time-units to serve the purpose of analyses, while keeping the link with the SNA framework and bypassing the issue of the valuation of unpaid labour.

Contrary to mineral exploration, computer software development and literary or artistic work, Research and Development is in the present SNA-1993 not considered as an activity leading to the creation of intangible assets. It is expected that this will change in the course of the coming SNA update. This paper discusses a number of conceptual and practical issues concerning the representation of R&D expenditure in the national accounts, including its capitalisation.

Hedonic methods are a promising tool when calculating price indexes for products experiencing rapid technological change. In this paper several hedonic time dummy price indexes are calculated for televisions, refrigerators, washing machines and personal computers, based on scannerdata of the population of sales over the period 1999-2001. It appears that for televisions, refrigerators, washing machines a population based matched-model index is a good approximation of the so-called generalised Törnqvist index, that is used as benchmark index among the hedonic time dummy indexes. The paper analyses which conditions have to be fulfilled for a matched-model index to be a good approximation. The paper also shows the dynamic structure in consumer sales of some of these durables.

Nonresponse is a recurring problem in household surveys in many countries. Response rates of Statistics Netherlands surveys often vary between 50% and 60%. Research shows that nonresponse is usually selective. Respondents and nonrespondents differ at various demographic characteristics. To avoid a substantial negative impact on the quality of survey results, often weighting adjustment techniques are carried out. Statistics Netherlands has a large amount of background information available for this purpose. This information originates from registers and other administrative sources. The paper describes research aimed at finding auxiliary variables that are most important for including in weighting models. Also a technique is proposed to select the best weighting model. Theory is applied to data from a major survey.

This paper investigates the role of economic determinants in the dissolution of recently formed marital and non-marital unions. The economic determinants studied are women’s and men’s personal income and socio-economic positions, and the contribution of women’s income in the total household income. The discrete-time event history analyses cover the dissolution of cohabiting and married unions, using longitudinal data from Statistics Netherlands’ Income Panel Study.

The aim of this study was to improve the timeliness of the monthly statistics on sales development in the Dutch manufacturing industry. The focus was on producing timely indicators in the presence of missing data. Currently, missing data are imputed using, besides already available data for the month in question, known data from only one previous month, whereas we propose to use whole time series for that purpose. It was concluded that timeliness can be improved from 37 days to 27 days after the end of the month, without sacrificing accuracy.

In this paper we propose a new approach to impute data under linear restrictions. If the data are normally distributed we will use simulations from the standard normal distribution. If the data are not normally distributed we study the use of the Dirichlet distribution.

Estimates for population statistics can be seriously biased in case response rates are low and the response to a survey is selective. Methods like poststratification or propensity score weighting are often employed in order to adjust for bias due to nonresponse.

This paper analyses the nonresponse of the Integrated Survey on Living Conditions, a large continuous survey of Statistics Netherlands. For this survey, more auxiliary variables were available than in regular situations. These variables could be obtained by linking registers and databases to survey data files. Moreover, also a number of fieldwork variables were included in the analysis. By analysing the enriched survey data file, more information could be obtained about a possible under- or over-representation of certain groups in the survey.

A simulation study is carried out to investigate the performance of repeated weighting estimators and corresponding variance estimators. The study concerns two three-way frequency tables for a population of persons to be estimated from a sample matched to a register. Repeated weighting estimates are general regression estimates adjusted to certain marginal totals in order to achieve numerical consistency, in this case with register counts. For the particular frequency tables studied, repeated weighting and corresponding variance estimation perform well. Even for moderately small samples, mean squared errors are comparable to those for the general regression estimator that serves as starting point for the repeated weighting estimator. For the second frequency table simulated, a simplified variance estimator for the repeated weighting estimator is computed, which performs nearly as well as the original repeated weighting variance estimator.

Statistics Netherlands participated in the EUREDIT project, a large international research and development project on statistical data editing and imputation that lasted from March 2000 till February 2003. The main goals of this project were the development and evaluation of new and currently used methods for data editing and imputation. In this paper we describe the general approach applied by Statistics Netherlands on the two business surveys used in the EUREDIT project. We also describe the development of our edit and imputation strategy and give results supporting the choices we have made. Finally, we provide results of our approach on the two evaluation data sets, and compare these results to the results of the other institutes participating in EUREDIT.

In this paper we present a new algorithm for solving the error localisation problem for a mix of continuous and categorical data. This algorithm is based on constructing a binary search tree.

This paper deals with non-response bias, discussing a few approaches in this field. It is demonstrated that nonresponse bias as to voter turnout is lower in a survey on living conditions than in a purely political survey. In addition, auxiliary information from registrations is used to investigate non-response and its bias among ethnic groups. Response rates among ethnic minority groups are rather low, but there is no evidence that response rates are lower in lower social class areas. Unsurprisingly, correcting for limited socio-economic deviations.

Repeated weighting provides a method to obtain sets of table estimates with numerically consistent margins from combinations of registers and surveys. It is based on repeated application of the regression estimator and generates a new set of weights for each table which is estimated. Repeated weighting is implemented in the prototype software package VRD. This report describes the results of five simulations in which various aspects of repeated weighting were tested. The differences in accuracy between the repeated weighting- and the standard regression estimator were found to be small. When correctly implemented, repeated weighting consistently yielded a smaller standard deviation. In certain cases, a very limited increase in bias compared to standard weighting was found. The VRD estimator for the variance was found to be reliable only for cells of sufficient size and with a low enough variance of the weights.

Experiments embedded in ongoing sample surveys are particularly appropriate to test effects of alternative surevey methodologies on estimates of finite population parameters.

Statistics Netherlands is presently making a considerable effort in combining data from administrative sources with, mainly household, survey data. By making efficient use of register data, Statistics Netherlands intends to improve the accuracy of its statistical information, and, at the same time, to decrease the response burden on households. The resulting large micro dataset with combined data is called the 'Social Statistical Database' (SSD); estimates related to social statistics are obtained from this SSD. Preferably, these estimates should be numerically consistent, although they might be obtained from different sources. At Statistics Netherlands, a new estimation method has been developed which, under certain conditions, ensures numerically consistent table sets. This method is called 'repeated weighting', and is based on a repeated use of the regression estimator. In the present paper we describe this new estimation method.

A simulation study is performed to compare linearization and balanced repeated replication (BRR) variance estimates in various situations.

Over the last few years several algorithms for solving the so-called error localisation problem have been developed at Statistics Netherlands. For six data sets involving numerical data we present computational results for four of those algorithms in this paper. The algorithms are based on a standard mixed integer programming formulation, on the generation of the vertices of a certain polyhedron by means of an adapted version of Chernikova’s algorithm, on a branch-and-bound algorithm using Fourier-Motzkin elimination, and on a cutting plane approach.

Discussion paper A remote access system is a facility where users can submit queries for statistical information from their own computer. These queries are handled by the statistical agency and the generated, possibly confidentialised, output is returned to the user. This way the agency still keeps control over its own data while the user does not need to make frequent visits to the agency.For some years, the Luxembourg Income Study (LIS) and Luxembourg Employment Study (LES) have made use of an advanced remote access system. At Statistics Netherlands and at other statistical institutes recently the need for a similar system has been expressed. In this paper, we discuss the characteristics, limitations and desired properties of a remote access system. We illustrate the discussion by the system used at LIS/LES.

Statistics Netherlands is planning to set up a system of productivity statistics. An innovative approach is to directly build up productivity indices from data at the level of the individual firm. To study the feasibility of this approach, several exercises have been carried out, using micro-data on trade services, transport services and business services. It will be shown how sensitive productivity indices are with respect to the method. Some recommendations about the method will be given.

This paper is about the relation between productivity change at the micro level and the meso level. The paper considers the interplay between productivity change at the firm level and changes in the industrial structure which are caused by factors such as growth or decline and entry or exit of firms. The availability of firm-level data underlying officially publicized aggregate figures makes it possible to explore this area.