Connecting correction methods for linkage error in capture-recapture
Car ownership and driving licence ownership of young adult students and workers.
Investigating co-integraton between inflation indicators within the price dashboard using structural time series models.
Investigating correlation and temporal relationships between the inflation indicators within the price dashboard.
the use of mobile devices, i.e. smartphones and tablets, in individual surveys in 2017.
Collecting information about the structure of random access networks
re-analysis of Groningen earthquake frequency because of improved daily time resolution data for reservoir gas pressure
Paper Adjustment of heating values and C02 petrol and diesel
Exploring the possible setup and uses of natural capital accounts for the Dutch North Sea area.
A resample method to compute standard errors of estimates based on a structural time series model.
How to stratify dynamic populations of articles for price index computations.
Exploring the suitability of web-based text mining to classify businesses by economic activity
An assessment of the contribution of the Dutch public export credit insurance facility to Dutch GDP and employment
A Bayesian analyses is applied in the framework of adaptive survey
This paper presents statistical methods to measure the impact due to a survey process redesign to avoid interruption of time series.
Multi-source Statistics: basic situations of combining sources and estimation methods
Characteristics for the dynamic populations of articles
Filtering in the Fourier domain: a new set of filters for seasonal adjustment of time series and its evaluation
Seasonally adjusted series of Gross Domestic Product (GDP)
Linkage of data sets with different unit types
Een rapport van voortgezet onderzoek naar de frequentie van aardbevingen in Groningen, gerelateerd aan de gaswinning
Estimating educational attainment levels for the Dutch Virtual Census
Correspondence between survey and value added tax data on quarterly turnover
This paper discusses a new imputation method for estimating missing data.
R-indicators based on population totals
Surveys differ in their topics, language, style and design, and, consequently, in their sensitivity to measurement error.
Consistent estimation of a set coherent frequency tables
PPS Sampling with Panel Rotation for Estimating Price Indices on Services
The fields of randomized experiments and probability sampling are traditionally two separated domains
Automatically classifying reporting patterns in hours paid in administrative data
Model selection and MSE estimation in the state space model for the Dutch Labour Force Survey.
The document describes the Big Data training developed and the results obtained at Statistics Netherlands.
Statistical methods for reconciling data that are published at different frequencies (e.g. monthly and quarterly).
Establishing the accuracy of online panels for survey research
Multilevel Hierarchical Bayesian vs. State Space Approach in Time Series Small Area Estimation: the Dutch Travel Survey
This study compares two different techniques in a time series small area application: state space models estimated with the Kalman filter with a frequentist approach to hyperparameter estimation, and multilevel time series models estimated within the hierarchical Bayesian framework. The application chosen is the Dutch Travel Survey featuring level breaks caused by the survey redesigns, as well as small sample sizes for the main publication domains. Both models require variances of the design-based domain estimates as prior information. In practice, however, only unstable estimates of the design-based variances are available. In this paper, excessive volatility and a possible bias in design-based variance estimates are removed with the help of a state space model. The multilevel and state space modelling approaches deliver similar results. Slight differences in model-based variance estimates appear mostly in small-scale domains and are due to neglecting uncertainty around the hyperparameter estimates in state space models, and to a lesser extent due to skewness in the posterior distributions of the parameters of interest. The results suggest that the reduction in design-based standard errors with the hierarchical Bayesian approach is over 50% at the provincial level, and over 30% at the national level, averaged over the domains and time.
Non-probability samples provide a challenging source of information for official statistics, because the data generating mechanism is unknown. Making inference from such samples therefore requires a novel approach compared with the classic approach of survey sampling. We propose a framework based on predictive inference and discuss three classes of methods. We conduct a simulation study with a real-world data set.
This study investigates if and how internet search indices can contribute to a quantitative picture of health (related) topics in the Netherlands.
The research presented in this paper has been done in the context of the further development of the methodology of calculating the Hospital Standardised Mortality Ratio (HSMR) in the Netherlands. It has been funded by Dutch Hospital Data (DHD). The goal of this research is to investigate the effect of adding post-discharge mortality to the HSMR. The advantage of adding post-discharge mortality, is that the mortality indicator becomes less dependent on the discharge policies of hospitals. Pouw et al. (2013) have investigated the possibilities and effects of adding post-discharge to the mortality indicator in the Netherlands. They concluded that this would lead to an improvement of the indicator. However, they did not investigate what the optimal period should be in which the post-discharge mortality is taken into account, whether or not the length of the period should differ between diagnosis groups and whether this period should be from admission or from discharge. These questions have been investigated by Statistics Netherlands and the results are presented in this paper (in Dutch).
In recent years, the importance of the households sector in measuring economic welfare has increasingly been recognised, and the development of additional indicators to measure inequalities is suggested. This article reports the work done by Statistics Netherlands concerning the development of such indicators. National accounts data has been combined with distributional information to divide income, consumption and wealth over household groups. This paper presents the preliminary results for the standard of living.
Assessment of selectivity of Big data sets is generally not straightforward, if at all possible. Some approaches are proposed in this paper. It is argued that the degree to which selectivity – or its assessment – is an issue, depends on the way the data are used for production of statistics. The role Big data can play in that process ranges from minor over supplementary to vital. Methods for inference that are in part or wholly based on Big data need to be developed, with particular attention to their capabilities of dealing with or correcting for selectivity of Big data. This paper elaborates on the current view on these matters at Statistics Netherlands, and concludes with some discussion points for further consideration or research.
The Quality Guidelines 2014 sets out the guidelines applicable to the statistical processes of Statistics Netherlands. The guidelines form the basis for audits and self-assessments of statistical processes. The report may also serve as input for redesign processes of statistics. The Quality Guidelines integrates international and national frameworks as well as Statistics Netherlands’ guidelines and Board resolutions.
We carried out an analysis at a sectoral level, and tested a score of different, potentially relevant, weather effects. The influence on GDP was then computed by aggregation. We found that, on a quarterly basis, several industries exhibit a significant weather effect. As a result of sectoral effects having opposite signs, the net effect of unusual weather on GDP as a whole is rather modest for some periods, while for other periods, the effect is more substantial.
The estimation of measurement effects (MEs) of survey modes in the presence of selection bias poses a great problem to methodologists. We present a new method to estimate MEs by means of “within-subject designs”, in which the same sample is approached by two different modes at two subsequent points in time. The decomposition of mode effects into MEs and selection biases is illustrated for key statistics from the Dutch Crime Victimization Survey using data from a large-scale within-subject experiment conducted within Statistics Netherlands’ project Mode Effects in Social Surveys (abbreviated to MEPS in Dutch).
Linear structural equation models (SEMs) are widely used to assess the validity and reliability of survey variables. When population means or totals are of interest, it is also important to assess whether the observed variables contain an intercept bias. Unfortunately, standard identification procedures for SEMs define an arbitrary metric for the latent variables, which prevents the estimation of valid latent means and intercepts in a single population.
This paper explores the use of matched samples as an alternative for estimators based on surveys suffering from a substantial amount of nonresponse.
The study aims the improvement of quality of the Integrated Household Survey (IHS) of GeoStat.
We describe the role of information management in redesign programs: why it is needed, how it is at present being developed through international cooperation in the Generic Statistical Information Model (GSIM) project, and we give some examples how information management has been used in practice at Statistics Netherlands. We conclude that GSIM is necessary for more extensive usage of big data sources, for interoperability of statistical tools and for sharing tools between NSIs.
In 2011, a large-scale mixed-mode experiment was linked to the Crime Victimisation Survey (CVS). This experiment consisted of a randomized allocation of sample persons to the four survey modes Web, mail, telephone and face-to-face, and a follow-up using only interviewer modes face-to-face and telephone. The aim of the experiment was to disentangle mode-specific selection- and measurement effects. The analyses show that contact effort has little impact on the size of measurement bias and a modest impact on the size of selection bias. Also, interviewer performance plays just a small role in the size of both biases. From these results, we conclude that contact effort and interviewer performance do not have a simultaneous impact on nonresponse and measurement error.
Recently, representativeness indicators, or R-indicators, have been proposed as indirect measures of nonresponse error in surveys.
Adaptive survey designs to minimize survey mode effects. A case study on the Dutch Labour Force Survey
Assessing the impact of mode effects on survey estimates has become a crucial question due to the increasing appeal of mixed-mode designs. Despite the advantages of a mixed-mode design such as lower costs and increased coverage, there is sufficient evidence that mode affects may sometimes be large relative to the precision. They may lead to incomparable statistics in time or over population subgroups and they may increase bias. Adaptive survey designs offer a flexible mathematical framework to obtain the optimal balance between survey quality and costs. In this paper we employ adaptive designs in order to minimize mode effects. We illustrate our optimization model by means of a case study on the Dutch Labour Force Survey.
Early school-leaving in the Netherlands. A multidisciplinary study of risk and protective factors explaining early school-leaving
Signals for a increased risk of early school-leaving are visible as early as the first year of secondary school. This is one of the findings in the Ph.D. thesis by Ms. T. Traag, researcher at Statistics Netherlands.
This paper presents and discusses some new results on the second-order inclusion probabilities of a systematic probability proportional to size sample drawn from a randomly ordered list, also called randomized PPS sampling. It is shown that some standard approximations of these second-order inclusion probabilities meant for relatively small sample sizes, need not be valid when the sample size n is of the same order as the population size N. In addition, it is shown that under a number of assumptions the variance formulas for rejective Poisson sampling can be applied to randomized PPS sampling designs when both n and N-n are large.
On course, but not there yet: Enterprise architecture conformance and benefits in systems development
Various claims have been made regarding the benefits that EnterpriseArchitecture (EA) delivers for both individual systems development projects and the organization as a whole. This paper presents the statistical findings of a survey study(n=293) carried out to empirically test these claims. First, we investigated which techniques are used in practice to stimulate conformance to EA. Secondly, we studied which benefits are actually gained. Thirdly, we verified whether EA creators (e.g. enterprise architects) and EA users (e.g. project members) differ in their perceptions regarding EA. Finally, we investigated which of the applied techniques most effectively increase project conformance to and effectiveness of EA. A multivariate regression analysis demonstrates that three techniques have a major impact on conformance: carrying out compliance assessments, management propagation of EA and providing assistance to projects. Although project conformance plays a central role in reaping various benefits at both the organizational and the project level, it is shown that a number of important benefits have not yet been fully achieved.
This paper gives alternative derivations for the standard variance formulas in two-stage sampling. The derivations are based on a direct use of the statistical properties of the sampling errors in the second stage. For the ease of exposition we examine the specific case that simple random sampling is used in both stages. These derivations might be useful for readers looking for more elementary approaches to two-stage sampling.
Numerical and categorical data used for statistical analyses is often plagued with missing values and inconsistencies. In many cases, a number of missing values may be derived, based on the consistency rules imposed on the data and the observed values in a record. The methods used for such derivations are called deductive imputation. In this paper, we describe the newly developed deductive imputation functionality of R package deducorrect.
Analyses of categorical data are often hindered by the occurrence of inconsistent or incomplete raw data. Although R has many features for analyzing categorical data, the functionality for error localization and error correction are currently limited. The editrules package is designed to o er a user-friendly toolbox for edit de nition, manipulation, and error localization based on the generalized paradigm of Fellegi and Holt.
This report describes the results of the analyses of the Personal Wellbeing Index (PWI) for the Netherlands.
Competition can be good or bad for innovation. In this paper a model is tested in which an increase of competition stimulates innovation when competition is low, but where innovation is discouraged when the level of competition goes beyond a certain threshold. We use industry- as well as firm-level data, and find evidence for such a threshold using two different competition measures.
CBS is looking for a suitable model for quality management. One reason is that CBS wants to manage quality in a systematic way to meet the European Statistics Code of Practice and the Quality Declaration of the European Statistical System. Existing quality systems do not fully meet CBS requirements, so a model was developed that would comply. One of the requirements is that the new model should combine with the EFQM Excellence Model. The OQM model is composed of components of well-known quality management models. This model is applicable to all areas of quality assurance and all types of organizations. It is called an object-oriented model, because objects play a central role in it.
This report describes nineteen characteristics of statistical output. Each characteristic – also called dimension - is elaborated according to a certain structure starting with the definition of the characteristic. For each characteristic possible indicators and measures are formulated and summerized as a checklist in an annex. This report has several purposes. Seven purposes of the report are identified like serve as a knowledge base while making an agreement with customers about quality of statistical output. The report does not contain guidelines for the CBS organization and has no mandatory character. Although it can serve as a starting point for developing guidelines.
DMK participates in a project, with the aim of implementing a new, efficient method for the benchmarking of National Accounts. Benchmarking is the process to achieve mathematical consistency between low-frequency (e.g. annual) and high-frequency (e.g. quarterly) accounts. In 2008 an extended Multivariate Denton method for benchmarking has been developed and implemented in proto-type software. Furthermore, an experiment was carried out on the Dutch supply and use tables. In this paper we briefly describe the method, the experiment and we review potential research topics for 2009. The aim of this paper is to ask the methodology advisory council’s advise and opinion on these topics.
This paper by Ralph Foorthuis and Sjaak Brinkkemper describes the various architectures at the project level, when conforming to Enterprise Architecture. Amongst the architectures described are Project Architecture, Project Start Architecture and Software Architecture. They are placed in the context of Enterprise Architecture and Domain Architecture.