Papers

Wetenschappelijke publicaties, proefschriften en verslagen van kwaliteitsonderzoeken op het gebied van statistiek.

Papers in deze reeks gaan over methoden, kwaliteit, processen en informatietechnologische en conceptuele onderwerpen en die relevant zijn voor het werkterrein van CBS. Ze beschrijven het resultaat van (toegepast) wetenschappelijk onderzoek dat onderzoekers van CBS (soms samen met anderen) hebben uitgevoerd. De reeks is niet bedoeld om CBS-cijfers te publiceren. De cijfers in de papers zijn dus niet als officiële CBS-uitkomsten aan te merken.

Papers

Haalbaarheidsstudie naar fysieke voorraden (urban mine) in onze economie

A small number of very big enterprises dominates R&D in the Netherlands. Some of these enterprises have their headquarters in the country, some have moved out, some have always been foreign-based. We have already seen many mergers and take-overs, and we expect to see more.In the last two decades or so, the organisation of R&D within the big enterprises in the manufacturing industry has changed. Instead of large centralized laboratories we now see more smaller decentralized R&D labs. Apart from the big enterprises engaged in R&D, we also see a large number of small and medium enterprises, focusing on R&D as their main activity.

This discussion paper describes a method based on latent class modeling

This discussion paper describes a method based on latent class modeling

This discussion paper describes a method based on latent class modeling

Statistical methods for reconciling data that are published at different frequencies (e.g. monthly and quarterly).

Discussion Paper about model-based methods for the estimation of monthly unemployment by province

Measuring the uncertainty in accounting equations (estimated figures connected by known constraints) by scalar measures.

Big data and methodological challenges in official statistics

Financiële kengetallen van zorginstellingen over het jaar 2014

Aandachtspunten bij de waarneming en verwerking van internetgegevens

This article contains an appraisal of two recently proposed methods of inference for sequential mixed-mode surveys.

Discussion paper on the potential and implications of profiling big data sources for official statistics.

This is a methodological report on the extension of the Materials Monitor.

Establishing the accuracy of online panels for survey research

This study compares two different techniques in a time series small area application: state space models estimated with the Kalman filter with a frequentist approach to hyperparameter estimation, and multilevel time series models estimated within the hierarchical Bayesian framework. The application chosen is the Dutch Travel Survey featuring level breaks caused by the survey redesigns, as well as small sample sizes for the main publication domains. Both models require variances of the design-based domain estimates as prior information. In practice, however, only unstable estimates of the design-based variances are available. In this paper, excessive volatility and a possible bias in design-based variance estimates are removed with the help of a state space model. The multilevel and state space modelling approaches deliver similar results. Slight differences in model-based variance estimates appear mostly in small-scale domains and are due to neglecting uncertainty around the hyperparameter estimates in state space models, and to a lesser extent due to skewness in the posterior distributions of the parameters of interest. The results suggest that the reduction in design-based standard errors with the hierarchical Bayesian approach is over 50% at the provincial level, and over 30% at the national level, averaged over the domains and time.

In this manuscript we consider an important topic in official statistics, namely estimating the number of usual residents. For the Netherlands we investigate the under coverage of the Population Register. First, the Population Register is linked to an Employment Register and a Crime Suspects Register. Then, we use three list capture-recapture methodology to estimate the number of usual residents. Two problems arise: 1) all three registers have no variable on usual residence; 2) capture-recapture methodology relies on a couple of assumptions, for which it cannot be verified whether they are met. The paper shows how the problems are solved.

In this paper the question is addressed how alternative data sources, such as administrative and social media data, can be used in the production of official statistics. Since most surveys at national statistical institutes are conducted repeatedly over time, a multivariate structural time series modelling approach is proposed to model the series observed by a repeated surveys with related series obtained from such alternative data sources. Generally, this improves the precision of the direct survey estimates by using sample information observed in preceding periods and information from related auxiliary series. The concept of cointegration is applied to address the question to which extent the alternative series represent the same phenomena as the series observed with the repeated survey. The methodology is applied to the Dutch Consumer Confidence Survey and a sentiment index derived from social media.

In mixed-mode surveys, mode-differences in measurement bias, also called measurement effects or mode effects, continue to pose a problem to survey practitioners. In this paper, we discuss statistical adjustment of measurement bias to the level of a measurement benchmark mode during inference from mixed-mode data. In doing so, statistical methodology requires auxiliary information which we suggest to collect in a re-interview administered to a sub-set of respondents to the first stage of a sequential mixed-mode survey. In the re-interview, relevant questions from the main survey are repeated. After introducing the design and presenting relevant statistical theory, this paper evaluates the performance of a set of six candidate estimators that exploit re-interview information in a Monto Carlo simulation. In the simulation, a large number of parameters is systematically varied, which define the size and type of measurement and selection effects between modes in the mixed-mode design. Our results indicate that the performance of the estimators strongly depends on the true measurement error model. However, one estimator, called the inverse regression estimator, performs particularly well under all considered scenarios. Our results suggest that the re-interview method is a useful approach to adjust measurement effects in the presence of non-ignorable selectivity between modes in mixed-mode data.

In multi-mode questionnaire design, usually some consideration is given to mode-specific measurement error. Despite this consideration, however, these measurement effects can be unexpectedly large. For this reason, there is a strong incentive to better predict measurement effects. This may be done by constructing profiles of a questionnaire, in which relevant item characteristics are summarized. For all items of a survey, these item characteristics need to be coded and combined. In this paper, we evaluated a list of item characteristics that literature has reported as relevant to mode-specific measurement error. Most importantly, we evaluated the reliability of the coding of such characteristics. Our results showed that intercoder reliability can be low for the most relevant characteristics. This may be explained by the difficulty of defining the item characteristics and the inherent subjectivity with which these item characteristics are coded. Finally, some suggestions are made for coping with low intercoder reliability.

Recent survey literature shows an increasing interest in survey designs that adapt data collection to characteristics of the survey target population. Given a specified quality objective function, the designs attempt to find an optimal balance between quality and costs. Finding the optimal balance may not be straightforward as corresponding optimization problems are often highly non-linear and non-convex. In this paper, we discuss how to choose strata in such designs and how to allocate these strata in a sequential design with two phases. We use partial R-indicators to build profiles of the data units where more or less attention is required in the data collection. In allocating cases, we look at two extremes: surveys that are run only once, or infrequent, and surveys that are run continuously. We demonstrate the impact of the sample size in a simulation study and provide an application to a real survey, the Dutch Crime Victimisation Survey.

A common problem faced by statistical institutes is that some data of otherwise responding units may be missing. This is referred to as item non-response. Item-nonresponse is usually treated by imputing the missing data. The problem of imputing missing data is complicated by the fact that statistical data often have to satisfy so-called edit rules, which for numerical data usually take the form of linear restrictions. A further complication is that numerical data sometimes have to sum up to known totals. Standard imputation methods for numerical data as described in the literature generally do not take such linear edit restrictions on the data or known totals into account. In this paper we develop simple imputation methods that satisfy edits and preserve known totals. These methods are based on well-known hot deck approaches. Extension of our methods to other types of imputation, such as regression imputation or predictive mean matching, is straightforward.

National statistical institutes try to construct data sets that are rich in information content by combining available data as much as possible. Unfortunately, units, e.g. persons or enterprises, in different data sources cannot always be matched directly with each other, for example because different data sources often contain different units. In such a case one can sometimes resort to statistical matching rather than exact matching. Statistical matching can be used when different data sources contain (different) units with a set of common (background) variables. These common variables may then be used to match similar units in the data sources to each other. From March 2015 till the end of June 2015 two master students, Sofie Linskens and Janneke van Roij, at Tilburg University evaluated some methods for statistical matching, namely random hot deck, distance hot deck and statistical matching using a parametric model, on categorical data from the Dutch Population Census 2001. In this paper we describe the methods that they examined and the results they obtained.

Data editing is the process of checking and correcting data. In practise, these processes are often automated. A large number of constraints needs to be handled in many applications. This paper shows that data editing can benefit from constraint simplification techniques that are often used in Operations Research and Artificial Intelligence. Performance can be improved and a better quality of automatically corrected data can be obtained. First, a new procedure for constraint simplification will be proposed that is especially developed for data editing; a procedure that combines several known algorithms from Operations Research and Artificial Intelligence. Thereafter, it will be demonstrated that real-life edit sets can actually be simplified.

In this paper, we describe a method for assessing the measurement validity as well as the bias of administrative and survey variables by means of a structural equation model. The method is applied to value-added tax Turnover for the Dutch short-term statistics.

In most real-life studies, auxiliary variables are available and are employed to explain and understand missing data patterns and to evaluate and control causal relations with variables of interest. Usually their availability is assumed to be a fact, even if the variables are measured without the objectives of the study in mind. As a result, inference with missing data and causal inference require a number of assumptions that cannot easily be validated or checked. In this paper, a framework is constructed in which auxiliary variables are treated as a selection, possibly random, from the universe of variables on a population. This framework provides conditions to make statistical inference beyond the traces of bias or effects found by the auxiliary variables themselves. The utility of the framework is demonstrated for the analysis and reduction of nonresponse in surveys. However, the framework may be more generally used to understand the strength of associations between variables. Important roles are played by the diversity and diffusion of the population of interest, features that are defined in the paper and the estimation of which is discussed.

This study evaluated three types of bias – total, measurement, and selection bias – in three sequential mixed-mode designs of the Dutch Crime Victimization Survey: telephone, mail, and web, where nonrespondents were followed up face-to-face. In the absence of true scores, all biases were estimated as mode effects against two different types of benchmarks. In the single-mode benchmark (SMB), effects were evaluated against a face-to-face reference survey. In an alternative analysis, a ‘hybrid-mode benchmark’ (HMB) was used, where effects were evaluated against a mix of the measurements of a web survey and the selection bias of a face-to-face survey. A special re-interview design made available additional auxiliary data exploited in estimation for a range of survey variables. Depending on the SMB and HMB perspectives, a telephone, mail, or web design with a face-to-face follow-up (SMB) or a design involving only mail and/or web but not a face-to-face follow-up (HMB) is recommended based on the empirical findings.

A feasibility study on the concepts and data needed to create an integrated measurement system for the circular economy, bio-based economy, eco-taxation and other resource issues.

Non-probability samples provide a challenging source of information for official statistics, because the data generating mechanism is unknown. Making inference from such samples therefore requires a novel approach compared with the classic approach of survey sampling. We propose a framework based on predictive inference and discuss three classes of methods. We conduct a simulation study with a real-world data set.

Het koppelen van gegevens wordt steeds gebruikelijker in statistisch en academisch onderzoek. Door informatie uit verschillende bronnen te combineren kunnen vragen beantwoord worden die met de gegevens uit één bron niet of zeer lastig te beantwoorden zijn. Het is belangrijk te begrijpen wat de technische, methodologische en juridische beperkingen van gegevenskoppeling zijn. Dit rapport beschrijft drie voorbeelden waarin medische en algemene gegevensbronnen gekoppeld zijn. Het is tot stand gekomen in het kader van het Biolink NL-project, één van de zogenaamde regenboogprojecten gefinancierd door de Biobanking and Biomolecular Research Infrastructure (BBMRI-NL). Het is een samenwerking tussen een aantal academische onderzoeksinstituten en het CBS. Dit is het tweede en laatste rapport dat de methodologische aspecten van het koppelen van bestanden onderzoekt binnen het Biolink NL-project.

National statistical institutes (NSIs) fulfill an important role as providers of objective and undisputed statistical information on many different aspects of society. To this end NSIs try to construct data sets that are rich in information content and that can be used to estimate a large variety of population figures. At the same time NSIs aim to construct these rich data sets as efficiently and cost effectively as possible. This can be achieved by utilizing already available administrative data as much as possible, and supplementing these administrative data with survey data collected by the NSI. In this paper we will focus on obtaining consistent population estimates from such a combination of administrative data sets and surveys. We will sketch general approaches based on weighting, imputation and macro-integration, and discuss their advantages and drawbacks.

Estimating the effect of non-sampling errors on the accuracy of mixed-source statistics is not straightforward. Here we simulate the bias and variance of the turnover estimates in car trade due to classification errors, using a bootstrap approach. In addition, we study the extent to which manual selective editing at micro level can improve the accuracy. We discuss how to develop a practical method that can be implemented in production to estimate the accuracy of register-based estimates.

Previous international national accounting standards provided a somewhat scattered picture of pension arrangements. This made international comparability of statistics difficult, because of the diversity in pension schemes among countries. The new European System of Accounts (ESA 2010) allows for a better analysis and international comparability of the pension systems within and between countries, by introducing a supplementary table on pension schemes. This paper focuses on the construction of the supplementary pension table, including estimates for the size of the unfunded pension schemes. It describes the methodology used, and the choices made by Statistics Netherlands in completing this table for 2012.

This study investigates if and how internet search indices can contribute to a quantitative picture of health (related) topics in the Netherlands.

Dit onderzoek is uitgevoerd in opdracht van Dutch Hospital Data (DHD). Doel van het onderzoek is de eventuele doorontwikkeling van de Hospital Standardised Mortality Ratio (HSMR) door het meenemen van de sterfte kort na ontslag. Het voordeel van sterfte na ontslag meenemen in het gestandaardiseerde sterftecijfer is dat het cijfer minder afhankelijk wordt van het ontslagbeleid van de ziekenhuizen. In 2013 is een onderzoek uitgevoerd door Pouw et al. (2013) waarin gekeken is naar de mogelijkheid en de effecten van een dergelijke indicator in Nederland. De conclusie van dat onderzoek was dat het wenselijk zou zijn om inderdaad de sterfte kort na ontslag mee te nemen in het sterftecijfer. In dat onderzoek is er niet onderzocht wat de optimale periode is gedurende welke de sterfgevallen meegenomen zouden moeten worden, of deze periode verschilt per diagnosegroep en of deze periode gerekend moet worden vanaf de opname of het ontslag. In opdracht van DHD heeft het CBS dit onderzocht. De resultaten hiervan staan in dit rapport.

Repeated weighting is a method for the consistent estimation of multiple frequency tables from registers and sample surveys. Statistics Netherlands uses repeated weighting for the compilation of the Dutch census. The application of the method is not without its problems. Estimation problems may occur, especially for detailed tables, of which some of the cells are covered by only few (or even none) observations. This paper reviews existing solutions for methodological problems and proposes new solutions when necessary. The problems and solutions are illustrated with experiences of the Dutch 2011 Census compilation. The general message of this paper is that repeated weighting can be applied to very complex estimation problems, although it still has its limitations. (Jacco Daalmans)

Many target variables in official statistics follow a semicontinuous distribution with a mixture of zeros and continuously distributed positive values, also called zero-inflated variables. When reliable estimates for subpopulations with small samples are required, a model-based small area estimation method can be used, which improves the accuracy of the estimates by borrowing information from other domains. In this paper, two small area estimators are compared in a simulation study with zero-inflated target variables. The first estimator, the EBLUP, can be considered as the standard small area estimator, and is based on a linear model that assumes normal distributions. Therefore it is model-misspecified in our situation.The second estimator is based on a model that takes the zero-inflation into account and is therefore less misspecified. Both estimators are found to improve the accuracy compared to a design-based approach. The gain in accuracy is generally larger for the model that takes the zero-inflation into account. The amount of improvement depends on properties of the population. Furthermore, there are large differences in improvement between the domains.

Many surveys are repeated at regular intervals to monitor temporal change in quantities of social or economic importance. When small area estimation methods are applied in such settings, the question arises of which models to use. Model selection techniques are used to identify an optimal set of covariates from a larger set of candidates.In the context of repeated surveys the question arises how to select a model for small area predictions that are comparable over time. To this end, this paper presents and compares four approaches to model selection for repeated surveys. Consecutive editions of the Dutch crime victimization survey are used as a case study.

This study illustrates how time-series model-based techniques can improve the production of official statistical figures for repeatedly conducted surveys. Apart from reducing the variance of design-based estimates, time series models provide estimates on the magnitude of discontinuities caused by survey redesigns. In surveys with a subdivision into several domains, such models can be effectively applied in the context of small area estimation problems.

Insight into the nature of inflation dynamics is crucial for prediction and for understanding inflationary pressure. This study applies a whole range of linear and non-linear time series models to the rate of inflation in the Netherlands between 1970-2009 to test which one best describes inflation dynamics.

In this study it is shown that it is possible to construct indicators of regional economic activity based on local traffic intensity data. Sensor readings on traffic flow from the NDW database were used to construct a monthly indicator of traffic intensity in the Eindhoven region.

This study applies a range of predictive techniques to forecasting the sign of the next month’s change in five key economic indicators: consumer confidence, household consumption, producer confidence, manufacturing production and exports, all for the Netherlands. Techniques tested range from standard regression models to advanced machine learning techniques.

A time-series multi-level model is used to estimate municipal unemployment based on the Dutch Labour Force Survey (LFS) at a quarterly frequency. The model includes random municipality effects as well as random municipality-by-quarter effects. The latter are modelled as independent effects or random walks over time, or as a sum of both. The model is fit using quarterly LFS data from 2003 to 2008.

In this discussion paper, the interaction between survey item characteristics and survey mode is investigated in terms of measurement bias. For this purpose a typology of survey items was constructed and is presented in the paper. Mode-specific measurement bias was estimated using a large-scale mixed-mode experiment linked to the Dutch Crime Victimisation Survey of 2011. This experiment consisted of a randomized allocation of sample persons to the four survey modes web, paper, telephone and face-to-face. A multi-level model is used to explain mode-specific measurement bias for a set of approximately 125 items from the Crime Victimisation Survey.

Recently, various indicators have been proposed as indirect measures of nonresponse error in surveys. They employ auxiliary variables to detect nonrepresentative or unbalanced response. A class of survey designs known as adaptive survey designs maximizes these indicators by applying different treatments to different subgroups. The natural question is whether the decrease in nonresponse bias caused by adaptive survey designs could also be achieved by nonresponse adjustment methods. We discuss this question and provide theoretical and empirical considerations, supported by a range of household and business surveys. We find evidence that balancing response reduces bias more than adjustment does.

In this paper, a generalisation of the Fellegi-Holt paradigm is proposed that can incorporate various complex edit operations in a natural way. In addition, an algorithm is outlined that may be used to solve the resulting generalised error localisation problem, at least in theory. It is hoped that this generalisation may be used to increase the suitability of automatic editing in practice, and hence to improve the efficiency of data editing processes.

Macro-economic indicators about the labour force, published by national statistical institutes, are predominantly based on rotating panels. Sample sizes of most Labour Force Surveys in combination with the design-based or model-assisted mode of inference, usually applied in survey sampling, obstructs the publication of such indicators on a monthly frequency. It is shown how multivariate structural time series models can be used to obtain more precise model-based estimates.

In recent years, the importance of the households sector in measuring economic welfare has increasingly been recognised, and the development of additional indicators to measure inequalities is suggested. This article reports the work done by Statistics Netherlands concerning the development of such indicators. National accounts data has been combined with distributional information to divide income, consumption and wealth over household groups. This paper presents the preliminary results for the standard of living.

De Kwaliteitsrichtlijnen beschrijft de normen waaraan statistische processen van het CBS moeten voldoen. De richtlijnen vormen de basis voor audits en self-assessments van statistische processen. Dit rapport kan ook dient als input voor (her)ontpwerp van statistische processen. De Kwaliteitsrichtlijnen integreert bestaande internationale en nationale kaders, zowel als richtlijnen en directiebesluiten van het CBS.

Record linkage is becoming more and more common in statistical and academic research. Linking records makes it possible to combine data from different sources to answer research questions that are very difficult to answer using data from just one source. In the present paper, we demonstrate the influence that the choice of variables and linkage algorithms has on linkage results, but also the importance of the properties of the data sources.

Error localization is the problem of finding out which fields in raw data records contain erroneous values. The editrules extension package for the R environment forstatistical computing was recently extended with a module that allows for error localizationbased on a mixed integer programming formulation (MIP). In this paper we describe the MIP formulation of the error localization problem for the case of numerical, categorical, or mixed numerical and categorical datasets.

Assessment of selectivity of Big data sets is generally not straightforward, if at all possible. Some approaches are proposed in this paper. It is argued that the degree to which selectivity – or its assessment – is an issue, depends on the way the data are used for production of statistics. The role Big data can play in that process ranges from minor over supplementary to vital. Methods for inference that are in part or wholly based on Big data need to be developed, with particular attention to their capabilities of dealing with or correcting for selectivity of Big data. This paper elaborates on the current view on these matters at Statistics Netherlands, and concludes with some discussion points for further consideration or research.

We carried out an analysis at a sectoral level, and tested a score of different, potentially relevant, weather effects. The influence on GDP was then computed by aggregation. We found that, on a quarterly basis, several industries exhibit a significant weather effect. As a result of sectoral effects having opposite signs, the net effect of unusual weather on GDP as a whole is rather modest for some periods, while for other periods, the effect is more substantial.

Om de statistieken bij de tijd te houden worden regelmatig verbeteringen doorgevoerd. De eventuele reeksbreuken die daar de keerzijde van kunnen zijn worden zo goed mogelijk opgevangen. Gebruikers ondervinden dan zoveel mogelijk wèl de voordelen van de verbeteringen en zo min mogelijk de nadelen van een breuk. Dit rapport gaat over de verbeteringen en reeksbreuken in economische statistieken.

The estimation of measurement effects (MEs) of survey modes in the presence of selection bias poses a great problem to methodologists. We present a new method to estimate MEs by means of “within-subject designs”, in which the same sample is approached by two different modes at two subsequent points in time. The decomposition of mode effects into MEs and selection biases is illustrated for key statistics from the Dutch Crime Victimization Survey using data from a large-scale within-subject experiment conducted within Statistics Netherlands’ project Mode Effects in Social Surveys (abbreviated to MEPS in Dutch).

Linear structural equation models (SEMs) are widely used to assess the validity and reliability of survey variables. When population means or totals are of interest, it is also important to assess whether the observed variables contain an intercept bias. Unfortunately, standard identification procedures for SEMs define an arbitrary metric for the latent variables, which prevents the estimation of valid latent means and intercepts in a single population. In this paper, it is shown how an audit sample may be used to estimate a non-arbitrary set of identification restrictions.

Datacorrectie is gebaat bij indicatoren die op eenvoudige en duidelijke wijze de invloed vanhet correctieproces op de data aangeven. In dit rapport worden enkele indicatoren beschrevendie op grafische wijze aspecten van verandering in data ten gevolge van een correctieprocesuitlichten. De indicatoren vallen uiteen in twee soorten: indicatoren die betrekking hebben op waardenen indicatoren die betrekking hebben op regelschendingen.

This paper explores the use of matched samples as an alternative for estimators based on surveys suffering from a substantial amount of nonresponse.

We describe the role of information management in redesign programs: why it is needed, how it is at present being developed through international cooperation in the Generic Statistical Information Model (GSIM) project, and we give some examples how information management has been used in practice at Statistics Netherlands. We conclude that GSIM is necessary for more extensive usage of big data sources, for interoperability of statistical tools and for sharing tools between NSIs.

Within the collaboration between Statistics Netherlands and the National Statistics Office of Georgia (GeoStat) a pilot study was carried out. The study aims the improvement of quality of the Integrated Household Survey (IHS) of GeoStat. One of the issues is that the income of the households obtained from the IHS is not reliable and not representative for the population. GeoStat does not have any administrative or other source with information on household income. In order to obtain information related to household income we developed an asset ownership questionnaire. The questionnaire was applied to a small group of households. Using a linear regression model based on this questionnaire we estimated the household income for the IHS data frame in the Tbilisi (Georgia) area. In this paper we recommend GeoStat to use these estimates for stratification of the IHS according to household income. The stratification can also be used for improving the weighting of the IHS and make the income distribution of households more representative. The pilot study was applied only on households in the Tbilisi area.

In 2011, a large-scale mixed-mode experiment was linked to the Crime Victimisation Survey (CVS). This experiment consisted of a randomized allocation of sample persons to the four survey modes Web, mail, telephone and face-to-face, and a follow-up using only interviewer modes face-to-face and telephone. The aim of the experiment was to disentangle mode-specific selection- and measurement effects. The analyses show that contact effort has little impact on the size of measurement bias and a modest impact on the size of selection bias. Also, interviewer performance plays just a small role in the size of both biases. From these results, we conclude that contact effort and interviewer performance do not have a simultaneous impact on nonresponse and measurement error.

Recently, representativeness indicators, or R-indicators, have been proposed as indirect measures of nonresponse error in surveys. The indicators employ available auxiliary variables in order to detect nonrepresentative response. They may be used as quality objective functions in the design of survey data collection. Such designs are called adaptive survey designs as different subgroups receive different treatments. The obvious question is whether the decrease in nonresponse bias caused by adaptive survey designs could also be achieved by nonresponse adjustment methods that employ the same auxiliary variables.In this paper, we discuss this important question. We provide theoretical and empirical considerations on the role of both the survey design and nonresponse adjustment methods to make response representative. The empirical considerations are supported by a wide range of household and business surveys from Statistics Netherlands.

A problem with using households as sampling units in the sample design of panels is the instability of these sampling units over time. Changes in the household composition affect the inclusion probabilities required for design-based and model-assisted inference procedures. The required information to derive correct inclusion probabilities is often not available. This problem can be circumvented by sampling persons which are followed over time. At each period the household members of these sampled persons are included in the sample. This comes down to sampling with probabilities proportional to household size where households can be selected more than once but with a maximum equal to the number of household members. In this paper properties of this sample design are described and applied to the Dutch Regional Income Survey.

Data cleaning, or data preparation is an essential part of statistical analysis. In fact, in practice it is often more time-consuming than the statistical analysis itself. These lecture notes describe a range of techniques, implemented in the R statistical environment, that allow the reader to build data cleaning scripts for data suffering from a wide range of errors and inconsistencies, in textual format. These notes cover technical as well as subject-matter related aspects of data cleaning . Technical aspects include data reading, type conversion and string matching and manipulation. Subject-matter related aspects include topics like data checking, error localization and an introduction to imputation methods in R. References to relevant literature and R packages are provided throughout.

The report presents experimental estimations of the level and growth of human capital in the period 1999-2009, and discusses the plausibility of these estimates based on the lifetime income approach. The figures presented in this report are not official statistics of Statistics Netherlands, and not published as such. The estimates are still under investigation and have a highly explorative character, and should therefore be treated with caution when drawing conclusions based on the current research results. The results just give an indication of the sources of growth of human capital, investment in human capital, and how the estimate of human capital in the Netherlands compares to other types of capital and GDP, and to other countries.

In dit proefschrift wordt onderzocht hoe theorie en praktijk op het terrein van milieurekeningen beter met elkaar verbonden kunnen worden. Hiertoe is statistisch onderzoek gedaan op een aantal terreinen waaronder het waarderen van de uitputting van grondstoffen, de bepaling van het nationaal vermogen, en het nieuwe terrein ecosysteem rekeningen.

Assessing the impact of mode effects on survey estimates has become a crucial question due to the increasing appeal of mixed-mode designs. Despite the advantages of a mixed-mode design such as lower costs and increased coverage, there is sufficient evidence that mode affects may sometimes be large relative to the precision. They may lead to incomparable statistics in time or over population subgroups and they may increase bias. Adaptive survey designs offer a flexible mathematical framework to obtain the optimal balance between survey quality and costs. In this paper we employ adaptive designs in order to minimize mode effects. We illustrate our optimization model by means of a case study on the Dutch Labour Force Survey.

During redesigns of repeated surveys, the old and the new approach are often conducted in parallel to quantify discontinuities which are initiated by the modifications in the survey process. Due to budget limitations, the sample size allocated to the alternative approach is often considerably smaller, compared to the regular survey that is also used for official publication purposes. In this paper, small area estimation techniques are considered to improve the accuracy of domain estimates obtained under the alternative approach. Besides auxiliary information available from registrations, direct domain estimates available from the regular survey are useful auxiliary variables to construct model-based small area estimators. These methods are applied to a redesign of the Dutch Crime Victimization Survey.

This PhD thesis shows how paradata (process data) and other auxiliary variables can he used to improve survey fieldwork. Examples are: response, and how to accomplish a more respresentative response by means of adaptive survey design; advance letters, and the different impact they can have on sample persons; call scheduling, the study of timing and spacing of visits. The thesis pays ample attention to the role of interviewers, understanding their behaviour, and the impact of interviewer behaviour on the quality of survey results.

Dit rapport beschrijft in eenvoudige termen hoe een deugdelijke peiling (survey) moet worden opgezet en uitgevoerd.

New developments in computer technology, but also new challenges in society like increasing nonresponse rates, decreasing budgets, or demands for reducing the response burden, may lead to changes in survey methodology for official statistics. In this paper, the use of a Web panel for high quality statistics about the general population is explored. Such a panel needs to be probability based. A Web panel can be used for either longitudinal or cross-sectional studies. Depending on the type of panel, a number of choices for the implementation need to be made. These involve decisions on survey topics and questionnaires, the recruitment strategy, maintenance of the panel and how to deal with the various types of nonresponse. In this paper, these methodological issues are discussed in more detail.

Data editing is arguably one of the most resource-intensive processes at NSI’s. Forced by ever increasing budget pressure, NSIs keep searching for more efficient forms of data editing. Efficiency gains can be obtained by selective editing, that is limiting the manual editing to influential errors, and by automating the editing process as much as possible. In this paper we present a decomposition of the overall editing process in a number of different tasks given an up-to-date overview of all the possibilities of automatic editing in terms of these tasks.

Modernization of official statistics leads not only to other production processes and other products but also to another way of quality approach.

From 2006 to 2010 Statistics Netherlands conducted surveys on the size and structure of the hidden labour market. Five to ten percent of the respondents admitted that they did not report all of their income to the tax or social security authorities.

Mixed-mode surveys are known to be susceptible to mode-dependent selection and measurement effects, collectively referred to as mode effects. The use of different data collection modes within the same survey may reduce selectivity of the overall response but is characterized by measurement errors differing across modes. Inference in sample surveys generally proceeds by correcting for selectivity – for example by applying calibration estimators – and ignoring measurement error. When a survey is conducted repeatedly, such inferences are valid only if the measurement error remains constant between surveys. In sequential mixed-mode surveys it is likely that the mode composition of the overall response differs between subsequent editions of the survey, leading to variations in the total measurement error and invalidating classical inferences. An approach to inference in these circumstances, which is based on calibrating the mode composition of the respondents towards fixed levels, is proposed. Assumptions and risks are discussed and explored in a simulation and applied to the Dutch crime victimization survey.

The paper describes how Statistics Netherlands has developed a Knowledge and Innovation programme. The Knowledge part of the Programme has three goals: (i) to preserve knowledge with regard to the expected retirement wave (ii) to develop and share knowledge in order to be prepared for the future, and (iii) to provide adequate tooling for knowledge sharing. Mobility of employees is an important vehicle for knowledge sharing. Besides job rotation there are other instruments that can be used to create more flexibility in the organisation and stimulate knowledge sharing. The paper presents the first experiences of Statistics Netherlands with instruments like working in flexible, multidisciplinary teams and internal network communities in order to stimulate knowledge sharing and developing and to make use of best practices.

It is the challenge of statistical offices to use their knowledge and innovation power to the optimal extent in order to remain able to respond pro-actively and in a creative way to these developments. This paper describes how Statistics Netherlands has developed an Innovation programme. A three-stage funnel approach plays a key role in this programme. This approach gives maximum room for bottom-up development of ideas, while its focuses at the same time on maximum contribution to the goals of the organization. The Innovation Lab is an important instrument for the Innovation programme. It offers a suitable environment to support the generation of ideas and test their feasibility.

Over the past years, the use of administrative data in both official statistics and academic research has grown. This development has made the problem of assessing the quality of administrative sources for statistical use increasingly important. Two of the main aspects of this are validity and reliability of measurement. Although this problem is often mentioned in qualitative terms, so far, not much research has been done on methods that assess the validity or reliability of administrative variables in a quantitative way. The objective of this paper is to describe a quantitative method for estimating validity and to present results obtained with this method in a simulation study.

Survey researchers currently act on three beliefs when combining survey modes in mixed-mode designs. First, modes elicit distinct patterns of survey response behaviour and nonresponse bias. Second, these selection differences are caused by differences in the response process of differential coverage, contact and cooperation. Third, mode-dependent response patterns might be exploitable by sequential mixed-mode designs ideally yielding samples less biased by selection and hence more ‘representative’. These assumptions are assessed using a factorial design, in which the Dutch Crime Victimisation Survey was administered either in CAPI, CATI, mail, or web.

Using a comprehensive unbalanced panel of firm-level data constructed from four surveys we test within a structural model what is known in the literature as the “weak” and the “strong” version of the Porter hypothesis.Our “Green Innovation” model includes three types of eco investments and non-eco R&D to explain differences in the incidence of innovation. We aim to estimate the relative importance of energy price incentives as a market based type of ER and the direct effect of environmental regulation on eco investment and firms’ decisions regarding the introduction of several types of innovations. We explicitly model the potential synergies of introducing the three types of innovations simultaneously and their synergy in affecting total factor productivity (TFP) performance. The results of our analysis show a strong corroboration of the weak version of the Porter hypothesis but not of the strong version of the PH, in this case on TFP performance.

Soft edit rules, i.e. constraints which identify (combinations of)values that are suspicious but not necessarily incorrect, are an importantelement of many traditional, manual editing processes. It is desirable to usethe information contained in these edit rules also in automatic editing.However, current algorithms for automatic editing are not well suited to usesoft edits because they treat all edit rules as hard constraints: each editfailure is attributed to an error. Recently at Statistics Netherlands, a newautomatic editing method has been developed that can distinguish betweenhard and soft edits. A prototype implementation of the new algorithm has been written in the R programming language. This paper reports some resultsof an application of this prototype to data from the Dutch Structural BusinessStatistics. The paper also introduces and tests several size measures of softedit failures that can be used with the new automatic editing method.

Improvement of waterflows in the National Water Balance;Water Stocks; feasibility of Water Balances per River Basin.Final report on Grant Agreement No. 50303.2010.001-2010.564

A common problem faced by statistical institutes is that data may be missing from collected data sets. The typical way to overcome this problem is to impute the missing data. The problem of imputing missing data is complicated by the fact that statistical data collected by statistical institutes often have to satisfy certain edit rules, which for numerical data usually take the form of linear restrictions. Standard imputation methods for numerical data as described in the literature generally do not take such linear edit restrictions on the data into account. Hot-deck imputation techniques form a well-known class and relatively simple to apply class of imputation methods. In this paper we extend this class of imputation methods so that linear edit restrictions are satisfied.

Signalen voor een verhoogde kans op voortijdig schoolverlaten zijn al zichtbaar in de brugklas. Dat is één van de conclusies van het proefschrift waarmee Mw. Drs T.Traag, onderzoeker bij het Centraal Bureau voor de Statistiek, promoveerde aan de Universiteit Maastricht.

Recently, survey literature has put forward responsive and adaptive survey designs as means to make efficient tradeoffs between survey quality and survey costs. The designs, however, restrict quality-cost assessments to nonresponse error, while in mixed-mode surveys the measurement or response error plays a dominant role. Furthermore, there is both theoretical and empirical evidence that the two types of error are correlated.In this paper, we investigate adaptive survey designs that minimize both errors simultaneously in the Labour Force Survey. The design features that are selected are self-reporting versus proxy reporting, and the number of contact attempts.

Data quality is important for statistical institutes, not only in relation to the data they produce, but also because they use more and more secondary data sources to produce statistics. Examples of secondary data sources, data produced by others, are administrative data, transactional data and data from the Internet.

An increasing number of people is active on various social media platforms. Here, people voluntarily share information, discuss topics of interest, and contact family and friends. Because the social media platform Twitter is used by a large number of people in the Netherlands and the pubic messages can be relatively easily collected, we investigated the content and usability of Twitter-messages for statistics. This revealed that a considerable amount of the messages collected, around 50%, could potentially be used to provide information on work, politics, spare time activities and events.

We discuss the relation between trust and statistical dissemination, drawing on examples from the Netherlands and comparing with other European countries. Dutch citizens have a fair amount of confidence in official statistics, even in the recent period of political and economic upheaval. The most important reason for this seems to be the political culture in the Netherlands, which puts a strong emphasis on rational policy making based on evaluations from scientific councils, committees and official research bureaus. We discuss how this came to be and how this influences the trust in the national statistical institute. and the consequences this has for the dissemination of statistical data.

Inference in official statistics is traditionally motivated from a design-based perspective, with the model-based approach being gradually adopted in specific circumstances. We take this shifting paradigm one step further, from model-based to algorithmic inference methods. Surveying a sample of the population of interest – typically enterprises or households – is fundamental to the design-based approach, where the design is the basis for inference. Model-based estimation methods may provide a viable alternative in situations where design information is not available. Estimation of the model parameters is pivotal, although in official statistics it is only an intermediate goal, as the model is ultimately used for prediction. Therefore, adopting a data-centred, algorithmic view rather than a model-centred view is possible. The algorithmic view encompasses methods generally attributed to the fields of data mining, machine learning, or statistical learning. Algorithmic methods may be useful in situations where data are not obtained through a sample survey, and where the typical models used in model-based estimation are not tenable.

Conflicting information may arise in statistical micro data due to partial imputation, where one part of the imputed record consists of the observed values of the original record and the other of the imputed values. Edit rules that involve variables from both parts of the record will often be violated. One strategy to remedy this problem is to make adjustments to the imputations such that all constraints are simultaneously satisfied and the adjustments are, in some sense, as small as possible. The minimal adjustments are obtained by minimizing a chosen distance metric subject to the constraints and we show how different choices of the distance metric result in different adjustments to the imputed data. As an extension we also consider an approach that does not aim to minimize the adjustments but to make the adjustments as uniform as possible between variables. Under this approach, even the values that are not explicitly involved in any constraints can be adjusted. The properties and interpretations of the proposed methods are illustrated using empirical business-economic data.

Steeds vaker wordt in peilingen de mening van ‘de Nederlander’ gevraagd over allerlei onderwerpen. De vraag is echter of al die peilingen wel een goed beeld geven van de werkelijkheid. Er zitten goede en slechte onderzoeken bij en het is niet eenvoudig om het kaf van het koren te scheiden. Door de vragen in deze checklist één voor één af te lopen, kunt u een eerste indruk krijgen van de kwaliteit van een onderzoek. Waarom het antwoord op elke vraag belangrijk is, wordt aan het eind van de checklist nader toegelicht.

When monthly business surveys are not completely overlapping, there are two different estimators for the monthly growth rate of the turnover: (i) one that is based on the monthly estimated population totals and (ii) one that is purely based on enterprises observed on both occasions in the overlap of the corresponding surveys. The resulting estimates and variances might be quite different. This paper proposes an optimal composite estimator for the growth rate as well as the population totals.

The quality of statistical statements strongly depends on the quality of the underlying data. Since raw data is often inconsistent or incomplete, data editing may consume a substantial amount of the resources available for statistical analyses. Although R has many features for analyzing data, the func-tionality for data checking and error localization based on logical restrictions (edit rules, or edits) is currently limited. The editrules package is designed to offer a user-friendly toolbox for edit definition, edit manipulation, data checking, and error localization.

In our research we aim to gain insight in the geospatial activity of mobile phone users. Points op interest are the correlation between calling- and economic activity, population density based on the number of active mobile phones in an area and movement statistics. A derived research question is deducing a method to obtain a tessellation of cell serving areas from a cell plan and combining different tessellations.For our research we obtained a dataset from a telecommunication company containing records of all call-events on their network in the Netherlands for a time period of two weeks. Each record contains information about the time and serving antenna and an identification key of the phone. The dataset is large (containing over 600 million records) and the cell plan has over 20.000 geo-locations of antennas. We devised a method to transform this cell plan with use of the Voronoi algorithm into an appropriate tessellation needed for geo-spatial analysis.Results of our research are a geo-spatial animation from which it is clearly visible that high call intensity coincides with high population density. Also with use of k-means clustering we found useful patterns in the time series of the call activity providing insight into economic activity over time and space. Using the unique phone identification we obtained information of the movement of Dutch inhabitants.

In 2011, Statistics Netherlands conducted a large-scale mixed-mode experiment linked to the Crime Victimization Survey. The experiment consisted of two waves; one wave with random assignment to one of the modes web, paper, telephone and face-to-face, and one follow-up wave to the full sample with interviewer modes only. The objective of the experiment is to estimate total mode effects and the corresponding mode effect components arising from undercoverage, nonresponse and measurement. The estimated mode effects are used to improve methodology for mixed-mode surveys. In this paper, we define mode-specific selection and measurement bias, and we introduce and discuss estimators for these bias terms based on the experimental design. Furthermore, we investigate whether mode effect estimators based on the first wave only, reproduce the estimates from the full experimental design. The proposed estimators are applied to a number of key survey variables from the Labour Force Survey and the Crime Victimization Survey.

The objective of this document is to describe a Standard for objects that are relevant for statistical processes and products. It is a Standard that can be applied to individual statistical processes.

Nonresponse in surveys may effect representativity, and therefore lead to biased estimates. A first step in exploring a possible lack of representativity is to estimate response probabilities. This paper proposes using the coefficient of variation of the response probabilities as an indicator for the lack of representativity. The usual approach for estimating response probabilities is by fitting a logit model. A drawback of this model is that it requires the values of the explanatory variables of the model to be known for all nonrespondents. This paper shows this condition can be relaxed by computing response probabilities from weights that have been obtained from some weighting adjustment technique.

In this paper a general methodology for automatic coding is suggested, which is supposed to be (fairly) general. Characteristic for the approach described in this paper is that each code (say for a business activity) is characterized by one or more combination of code words, or Cwords. These combinations of C-words can be seen as definitions of the various codes used. It is assumed that the order of the C-words is irrelevant to describe a code. Because it is unlikely that people will use exactly those Cwords when describing a business activity, synonyms, hyponyms and hyperonyms for the C-words are needed as well. They form the bridge between the definition of the codes in the classification used and the descriptions provided by respondent, and are called D-words. A semantic network is used to provide a bridge between the descriptions and the codes.Some inherent difficulties with automatic coding are presented, as well aspossible solutions to overcome them, either by solving or by sidesteppingthem.

This report describes the validity testing of a multi-item scale of global life satisfaction, namely the Satisfaction With Life Scale (SWLS). This scale has been proposed as an alternative to single-item life satisfaction measures.

This paper concentrates on methods for handling incompleteness caused by differences in units, variables and periods of the observed data set compared to the target one. Especially in economic statistics different unit types are used in different data sets.

In this paper we concentrate on methodological developments to improve the accuracy of a data set after linking economic survey and register data to a population frame. Because in economic data different unit types are used, errors may occur in relations between data unit types and statistical unit types. A population frame contains all units and their relations for a specific period. There may also be errors in the linkage of data sets to the population frame. When variables are added to a statistical unit by linking it to a data source, the effect of an incorrect linkage or relation is that the additional variables are combined with the wrong statistical unit. In the present paper we formulate a strategy for detecting and correcting errors in the linkage and relations between units of integrated data. For a Dutch case study the detection and correction of potential errors is illustrated

This paper by Daan Zult and Floris van Ruth describes a new type of analytical tool, developed for detecting early signals of changes in the development of exports. It is novel in several ways; it focuses on exports using a demand-pull approach, is based on a structural analysis of the demand for Dutch exports, and takes the form a disaggregated visual tool. 16 selected foreign demand sectors are monitored using leading signaling indicators. The aggregate is leading to the growth rate of Dutch exports. The disaggregated set up results in increased early warning capabilities, as changes in development in individual industries are immediately visible.

In this paper by Floris van Ruth, a graphical tool is presented for analysing the labour market. The state of the labour market, defined as tightness, is characterised via the interaction of supply of and demand for labour, enabling a more comprehensive analysis. Supply is defined as the proportion of the labour force holding a job, whilst demand is represented by the average of several indicators of labour demand. This approach results in a new and more general characterization of the state of the labour market, presented in an easy to interpret visual manner.

Recently Statistics Netherlands has started research into the share of illegal activities in the national income. The total contribution of illegal activities to the national income of the Netherlands increased from 1800 million euro in 1995 to almost 3500 million euro in 2008, equalling 0.6 percent of gross national income. The main illegal sector is drugs, which accounted for over 50 percent of the total income from illegal activities in 2001. In 2008 that share was down to less than 40 percent, whereas finding illegal employment rose from about 10 percent in 1995 to 33 percent in 2008.

This is a report for choosing a language and tool for the business and information architecture (BI-architecture) of Statistics Netherlands. This is a first step to improve the current way of BI-architecture development and maintenance. In the first phase of the project the business architects indicated the need to change the current language and tool for others. In the second phase of the project the business architects together with the IT Enterprise architects advised that Archimate and BizzDesign Architect are the best language and tool combination to support the new way of working.

Since design-based methods like the generalized regression estimator have large design variances in the case of small sample sizes, model-based techniques can be considered as an alternative. In this paper a simulation study is carried out where small area estimation based on a linear mixed model is applied to the variable turnover of the Structural Business Survey of Statistics Netherlands. By applying the EBLUP, the accuracy of the estimates can be substantially improved compared to the generalized regression estimator. The EBLUP estimates, however, are biased, which is partly caused by the skewed distribution of the variable turnover. It is found that by transforming the target variable both skewness and bias can be substantially reduced, whereas the variance increases. As a result, the accuracy is slightly improved compared to the EBLUP.

Macro-integration is widely used for the reconciliation of macro figures, usually in the form of large multi-dimensional tabulations, obtained from different sources. Traditionally these techniques have been extensively applied in the area of macro-economics, especially in the compilation of the National Accounts. Methods for macro-integration have developed over the years and have become very versatile techniques for solving integration of data from different sources at a macro level. In this paper we propose applications of macro-integration techniques in other domains than the traditional macro-economic applications. In particular, we present two possible applications for macro-integration methods: reconciliation of tables of a virtual census and combining estimates of labour market variables.

In order to produce official statistics of sufficient quality, statistical institutes carry out an extensive process of checking and correcting the data that they collect. This process is called statistical data editing. In this article, we give a brief overview of current data editing methodology. In particular, we discuss the application of selective and automatic editing procedures to improve the efficiency and timeliness of the data editing process.

The object of this paper is to present a new formulation of the error localisation problem which can distinguish between hard and soft edits

Survey nonresponse occurs when members of a sample cannot or will not participate in the survey. It remains a problem despite the development of statistical methods that aim to reduce nonresponse. In this paper, we address the problem of resource allocation in survey designs in which the focus is on the quality of the survey results given that there will be nonresponse. We propose a novel method in which the optimal allocationof survey resources can be determined. We demonstrate the effectiveness of our method by extensive numerical experiments.

This paper presents and discusses some new results on the second-order inclusion probabilities of a systematic probability proportional to size sample drawn from a randomly ordered list, also called randomized PPS sampling. It is shown that some standard approximations of these second-order inclusion probabilities meant for relatively small sample sizes, need not be valid when the sample size n is of the same order as the population size N. In addition, it is shown that under a number of assumptions the variance formulas for rejective Poisson sampling can be applied to randomized PPS sampling designs when both n and N-n are large.

Various claims have been made regarding the benefits that EnterpriseArchitecture (EA) delivers for both individual systems development projects and the organization as a whole. This paper presents the statistical findings of a survey study(n=293) carried out to empirically test these claims. First, we investigated which techniques are used in practice to stimulate conformance to EA. Secondly, we studied which benefits are actually gained. Thirdly, we verified whether EA creators (e.g. enterprise architects) and EA users (e.g. project members) differ in their perceptions regarding EA. Finally, we investigated which of the applied techniques most effectively increase project conformance to and effectiveness of EA. A multivariate regression analysis demonstrates that three techniques have a major impact on conformance: carrying out compliance assessments, management propagation of EA and providing assistance to projects. Although project conformance plays a central role in reaping various benefits at both the organizational and the project level, it is shown that a number of important benefits have not yet been fully achieved.

This paper gives alternative derivations for the standard variance formulas in two-stage sampling. The derivations are based on a direct use of the statistical properties of the sampling errors in the second stage. For the ease of exposition we examine the specific case that simple random sampling is used in both stages. These derivations might be useful for readers looking for more elementary approaches to two-stage sampling.

Numerical and categorical data used for statistical analyses is often plagued with missing values and inconsistencies. In many cases, a number of missing values may be derived, based on the consistency rules imposed on the data and the observed values in a record. The methods used for such derivations are called deductive imputation. In this paper, we describe the newly developed deductive imputation functionality of R package deducorrect.

Analyses of categorical data are often hindered by the occurrence of inconsistent or incomplete raw data. Although R has many features for analyzing categorical data, the functionality for error localization and error correction are currently limited. The editrules package is designed to o er a user-friendly toolbox for edit de nition, manipulation, and error localization based on the generalized paradigm of Fellegi and Holt.

The Dutch Labor Force Survey (LFS) is based on a rotating panel design. Recently an estimation procedure that is based on a multivariate structural time series model has been adopted to produce monthly official statistics about the labor force. This approach handles problems with rotation group bias and small sample sizes in an effective way and enables Statistics Netherlands to produce timely and accurate estimates about the labor market. In this paper the time series model is extended by incorporating an auxiliary series about people registered as unemployed in the register of the Office for Employment and Income. The information of the auxiliary series is used to improve the precision of the monthly unemployment figures by modelling the correlation between the trends of the LFS series and the auxiliary series of the registered unemployed labor force. It appears that the trend of the series of the registered unemployed labor force is cointegrated or almost cointegrated with the trend of the estimated unemployed labor force of the LFS for several domains. This results in a considerable decrease of the standard errors for the monthly unemployed labor force.

This paper describes how the bootstrap resampling method may be used to assess the accuracy of estimates based on a combination of data from registers and sample surveys. We consider three different estimators that may be applied in this context. The validity of the proposed bootstrap method is tested in a simulation study with realistic data fromthe Dutch Educational Attainment File.

This paper is the first of two papers describing the editrules package. The current paper is concerned with the treatment of numerical data under linear constraints, while the accompanying paper (Van der Loo and De Jonge, 2011) is concerned with constrained categorical and mixed data. The editrules package is designed to offer user-friendly interface for edit de_nition, manipulation and checking. The package offers functionality for error localization based on the paradigm of Fellegi and Holt and a flexible interface to binary programming based on the choice point paradigm. Lower-level functions include echelon transformation of linear systems, variable substitution and a fast Fourier-Motzkin elimination routine. We describe theory, implementation and give examples of package usage.

Mixed-mode surveys are susceptible to mode-dependent selection effects and measurement errors, collectively known as mode effects. In sequential mixed-mode surveys, where non-respondents in one mode are re-approached using a different mode, it is likely that the mode composition of the response differs between subpopulations or between subsequent editions of the survey. Such variations in the mode composition lead to variations in the measurement errors, invalidating classical inference. An approach to inference in these circumstances is proposed, by calibrating the mode composition of the response to fixed levels. Assumptions and risks associated with such a procedure are discussed. The case of the Dutch Crime Survey is discussed as an example.

Since raw (survey) data usually has to be edited before statistical analysis can take place, the availability of data cleaning algorithms is important to many statisticians. In this paper the implementation of three data correction methods in R are described. The methods of this package can be used to correct numerical data under linear restrictions for typing errors, rounding errors, sign errors and value interchanges. The algorithms, based on earlier work of Scholtus, are described and implementation details with coded examples aregiven. Although the algorithms have originally been developed with financial balance accounts in mind the algorithms are formulated generically and can be applied in a wider range of applications.

This paper shows that 19 relevant attributes of quality reports can be distinguished. These attributes are useful if we want to systematically manage the quality of quality reports and were established through analysis of documents about quality reporting and the minutes of the SQ-ESAC workshop about quality reporting. Each attribute is defined, but according to the Object-oriented Quality Management model more steps can be taken. Requirements can be formulated for each attribute and causes and effects of problems can be analyzed. Based on these requirements and risk analysis, measures can be taken to assure the quality of quality reports.

In most surveys all sample units receive the same treatment and the same design features apply to all selected people and households. In this paper, it is explained how survey designs may be tailored to optimize response rates and to reduce nonresponse selectivity. Such designs are called adaptive survey designs. The basic ingredients of such designs are introduced and discussed and illustrated with a number of examples including a pilot study.

The objective of this document is to describe a Standard for objects that are relevant for statistical processes and products. It is a Standard that can be applied to individual statistical processes.

Inventories play a crucial role in explaining business cycle turning points. Inventories contributed 0.7 percentage point to a 4 percent contraction of economic activity in 2009. In light of this, demand for inventory data has been growing since the financial crisis hit in late 2008. This paper analyses Dutch wholesale and manufacturing inventories and relates them to the business cycle.

Inventories are a useful statistic for tracking and analysing short-term economic developments. This paper by Floris van Ruth and Marcel van Velzen describes how the index of inventories of finished goods in the manufacturing industry can be used in business cycle analysis. Inventories themselves lag business cycle developments, and are therefore of limited use. Using the turnover index of the manufacturing industry to compute a ratio of inventory to sales (ISR) produces a new and leading business cycle indicator. The ISR is shown to consistently lead Dutch business cycle developments by one to two quarters. It is therefore one of the few real, i.e. non-financial and non-sentiment, leading indicators. Inventories are shown to exhibit clear co-movement with sales and the business cycle. The countercyclical development of the ISR is therefore explained by the fact that turnover reacts more strongly to business cycle developments than inventories.

In this paper, we describe the controversy that arose between Statistics Netherlands and the Ministry of Economic Affairs in 2009 after the Dutch government announced a tax relief measure for businesses, which deteriorated the quality of tax data used by Statistics Netherlands for producing short-term statistics.

Final report on Grant agreement no. 50303 2008 003 2008 352 In this discussion paper methods are presented to compile water abstraction and water use data at the level of River Basins in the Netherlands, for the years 2004-2008. In general, the methods build upon existing national data on water abstraction and drinking water use.

Organizational change (OC) is an important complementarity factor in the process of creating business value from information technology (IT) investments.. This paper investigates complementarities between IT capital and OC initiatives of the firm. It analyzes the productivity impact of different clusters of IT and OC in the manufacturing and services sectors of the economy. Three dimensions of OC are studied: process, structure, and boundary changes. Two distinct econometric approaches are applied to a unique and detailed sample of 32,619 firm-level observations in the Netherlands for the period 1994-2006. The results reveal that the productivity effect of IT significantly increases when technology investments are accompanied by relevant organizational changes. The observed complementarity effects between IT and OC are stronger for services than for manufacturing firms. The effects become stronger if different types of change are combined with each other and form clusters.

It is not only important to produce good quality statistics, but also that the users of these statistics believe that they are of good quality. Therefore, it is necessary that the subsequent provisional estimates of the national accounts show a similar picture of economic performance. I.e. the subsequent estimates have to be sufficiently reliable.In this paper is analysed to what extent this requirement is met, taking into account that this requirement seriously conflicts with timeliness. A computer program was developed to enable a quick reliability-check to a large number of economic variables, derived from the national accounts.

Apart from the traditional sources used by National Statistical Institutes, like sample surveys and administrative sources, nowadays more and more electronic sources of information are available that potentially can be used for the production of statistics. In the paper four sources are studied: i) Product prices on the internet, ii) Mobile phone location data, iii) Twitter text messages, and iv) Global Positioning System (GPS) data and traffic loop information. For each data source an overview is given of the usability of the collected information, as well as the practical and methodological challenges that lay ahead.

In surveys persons have a tendency to round their answers. For example, in the Labour Force Survey people are asked about the period they have been unemployed. There is clearly a tendency to give answers that are rounded to years of half years. Because of this rounding statistics based on this data tend to be biased. In this paper we introduce a method with which the rounding mechanism is modelled together with the ‘true’ underlying distribution. These are then used to select samples which are likely to be rounded an impute new values for these. This method is applied to the Labour Force Survey data. An investigation of robustness shows that the method is robust against misspecification of the model of the underlying distribution and to misspecification of the rounding mechanism.

Statistics Netherlands has started a process to review the statistical priorities. The demands of society change, but budget restrictions and the desire to reduce administrative burden do not allow increase of staff or surveys. Therefore, negative priorities are needed. A working group, chaired by the chief statistical officer and with members of the statistics divisions, was asked to assess proposals that were put forward by the statistics divisions. In order to try and objectify the comparison of the proposals, an assessment model was developed in cooperation with external consultancy. This paper describes the approach of the process.

Monthly Short Term Business statistics at Statistics Netherlands can be based on survey data, VAT records or a combination of these two data sources. Both sources are incomplete when statistics need to be produced. The survey response rate increases gradually in time and is still far from 100% after a month of data collection. The VAT register also fills gradually in time because i) quite some enterprises report on a quarterly or annual basis, and iii) those that report on a monthly basis report unevenly spread over time.In this paper we investigate and compare the representativity of survey and VAT response as a function of time. The objective is to determine whether VAT is as representative as survey data and can be used to produce accurate statistics. For this purpose we use so-called Representativity (R)-indicators and partial R-indicators. The results can be used in designing data collection for monthly statistics and in assessing the timing of processing survey and register data.

In the European Union tens of billions of euros are spent on regional policy every year. A major part of this amount is allocated on the basis of regional gross domestic product per capita. In this paper by Henk Nijmeijer an inventory is drawn up of recent work on quality of regional accounts estimates. Special attention is paid to the instrument of process tables.The regional accounts should be compiled in close cooperation with the national accounts. The quality of the national accounts estimates could be improved by the findings of the regional accounts compiling process.

Methods are considered to calculate a set of consistent price index numbers from an inconsistent set of chained index numbers. The inconsistencies are due to the existence of cycles in the price index graph cycles. The initial index numbers are calculated using an index formula, to the user’s choice. It is only required to satisfy a few simple consistency conditions. This does not include transitivity. One method (due to Hill) uses spanning trees to solve (or rather: sidestep) the inconsistency problem. The second method seeks to adjust the initial values in such a way that the new index numbers satisfy a transitivity criterion, and are close to the original index numbers. The approach in the present paper is inspired by levelling in land surveying.

We investigate the relationship between nonresponse error and measurement error as a function of a number of survey design features. Both types of survey error are quantified using indicators. Nonresponse error is analysed in terms of maximal nonresponse bias. Measurement error is decomposed into measurement profile risk and response bias. A measurement profile is a certain response style or behaviour.

Within the ongoing redesign program of social surveys at Statistics Netherlands a small area estimation method for labour status has been developed. The model used is the basic unit-level model, which is a linear mixed model with random area effects, where the areas are municipalities. We discuss several issues concerning model choice, including the use of linear (mixed) models for binary variables, the use of posterior means instead of maximum likelihood estimates to prevent zero or too small estimates of between area variance and the use of covariates at both the unit and area level. Several model selection measures and graphical diagnostics have been applied to arrive at a set of covariates used in the model. We focus on the estimation of municipal unemployment fractions, but also discuss estimation of fractions employed and not belonging to the labour force. The municipal estimates are benchmarked such that they are consistent with regularly produced provincial estimates. The small area estimates thus obtained have smaller estimated mean squared errors than the current estimates based on the generalized regression estimator, and display a much more plausible development over time.

This paper by Marcel van Velzen and Leendert Hoven describes the method used in the compilation of the monthly volume index of inventories of finished goods for the Dutch manufacturing industry. The index was introduced at the end of 2009. In the paper, the plausibility of the outcomes is assessed. The paper also addresses the potential use of the index in the compilation of production indices.

This exploratory paper presents a new method to measure the output of government-provided secondary education in the Netherlands. Transferred knowledge and skills, and not pupil-hours, are used as the principal output measure of education. The approach developed involves a transformation of traditional unit cost weights.

This paper by Floris van Ruth describes two methods for computing a monthly statistic based on a lower frequency, quarterly reference statistic and related monthly indicators. Both methods are based on a state space formulation and the Kalman filter. The first method is an interpolation methods, which is used to produce the monthly indicator of fixed capital formation development. The alternative method is a cumulative approach. Both methods produce credible and reliable monthly statistics. Crucial is the availability of relevant monthly indicators, in this case indicators of production and imports of capital goods.

Competition can be good or bad for innovation. In this paper a model is tested in which an increase of competition stimulates innovation when competition is low, but where innovation is discouraged when the level of competition goes beyond a certain threshold. We use industry- as well as firm-level data, and find evidence for such a threshold using two different competition measures.

This paper by Floris van Ruth describes the concept, working and outcomes of the Statistics Netherlands export, consumption and fixed capital formation radars. These are tools for monitoring and showing how conditions develop for the growth of the target key macro-economic indicators. By showing in one diagram indicators representing the driving factors for the relevant macro-economic quantity, a general picture is given of how conditions for this central indicator are developing. The graphic and dynamic character of the radar-concept allows for easy and quick analysis. This study presents the indicators selected for the export, consumption and fixed capital formation radars and their properties. From these radars a conditions indicator can be derived. The significance of the conditions for explaining the development of the central indicators is tested, as well as their predictive powers.

This paper explains the OQM model developed by SN, and describes nine applications of the model. The applications vary from large-scale (TQM and process assurance) to small-scale. They demonstrate that the concept of quality areas is both powerful and flexible, and can be used in any domain.

A major problem that has to be faced by basically all institutes that collect statistical data on persons or enterprises is that data may be missing in the observed data sets. The most common solution to handle missing data is imputation. At national statistical institutes and other statistical institutes, the imputation problem is further complicated owing to the existence of constraints in the form of edit restrictions that have to be satisfied by the data. Examples of such edit restrictions are that someone who is less than 16 years old cannot be married in the Netherlands, and that someone whose marital status is unmarried cannot be the spouse of the head of household. Records that do not satisfy these edits are inconsistent, and are hence considered incorrect. Another additional problem for categorical data is that the frequencies of certain categories are sometimes known from other sources or have already been estimated. In this paper we develop imputation methods for categorical data that take these edits and known frequencies into account while imputing a record.

In this paper we show that scale effects, market structure, and regulation determine the poor productivity performance of the European business services industry. We apply parametric and nonparametric methods to estimate the productivity frontier and subsequently explain the distance of firms to the productivity frontier by market characteristics, entry- and exit dynamics and national regulation. The frontier is assessed using detailed industry data panel for 13 EU countries. Our estimates suggest that most scale advantages are exhausted after reaching a size of 20 employees. This scale inefficiency is persistent over time and points to weak competitive selection. Market and regulation characteristics explain the persistence of X-inefficiency (sub-optimal productivity relative to the industry frontier). More entry and exit are favourable for productivity performance, while higher market concentration works out negatively. Regulatory differences also appear to explain part of the business services' productivity performance. In particular regulation-caused exit and labour reallocation costs have significant and large negative impacts on the process of competitive selection and hence on productivity performance.

At national statistical institutes experiments embedded in ongoing sample surveys are frequently conducted, for example to test the effect of modifications in the survey process on the main parameter estimates of the survey, to quantify the effect of alternative survey implementations on these estimates, or to obtain insight in the various sources of non-sampling errors. A design-based analysis procedure for factorial completely randomized designs and factorial randomized block designs embedded in probability samples is proposed in this paper. Design-based Wald statistics are developed to test whether estimated population parameters, like means, totals and ratios of two population totals, that are observed under the different treatment combinations of the experiment are significantly different. The methods are illustrated with a real life application of an experiment embedded in the Dutch Labor Force Survey.

Inleidende tekst: in dit cursusdocument worden de hoofdlijnen van de business architectuur van het CBS uiteengezet. Het niveau van deze inleidende cursus is ‘intermediate’. De nadruk ligt niet zozeer op allerlei theoretische begrippen uit de business architectuur, maar op de toepassing ervan op het CBS. Het doel van de cursus is dat de cursist een idee krijgt welke aspecten van belang zijn bij het ‘maken van een statistiek’, en hoe deze aspecten in samenhang tot elkaar staan. En dat is ook precies wat architectuur beoogt: het geven van een samenvattend beeld van één of meer artefacten in een omgeving, vooral betreffend coherente keuzes op het gebied van functie, structuur en stijl. De twee belangrijkste onderwerpen die in de cursus aan de orde komen zijn 1) het statistische product, i.e. wat is een statistiek en 2) het statistische proces, i.e. hoe maken we een statistiek.

In deze nota wordt een analyse beschreven van de (non)respons bij het Aanvullend Voorzieningen Onderzoek van 2007, waarvan het veldwerk is uitgevoerd door het CBS. Er wordt een overzicht gegeven van de respons van verschillene groepen in de samenleving, uitgesplitst naar de mate waarin met deze groepen contact wordt gemaakt en de mate waarin mensen in deze groepen bereid zijn mee te doen naar het onderzoek.

Statistical processes can be very complex, and it is not uncommon that they are designed and implemented as one big tangle of statistical activities. This paper is an initiative to structure and standardise the processing of statistical data. Concepts like ‘standard process step’ and ‘standard process’ are introduced and explained by means of both a non-statistical example (fixing a flat bike tyre) and a statistical example (matching two data files).

The paper describes the application of the Object Oriented Quality Management model to the object secondary data sources. The results obtained are compared to those of the, independently developed, Quality framework for Administrative Data Sources. An administrative data source is an example of a secondary data source. This exercise was performed to enable the evaluation of the strengths and weaknesses of the quality management model and the completeness of the quality framework.

Measurement of labour market flows depends on three major aspects of the job definition, namely (i) the size of the job; (ii) the length of the job, and (iii) whether in accordance to national accounting rules, jobs are identified with labour contracts, or whether in accordance to labour demand theory, unfilled vacancies are also counted as jobs. This paper looks at the sensitivity of measuring job and worker flows with respect to these alternatives for the job definition. Measurement of labour market dynamics appears especially sensitive for the dynamic dimension of the job definition, namely that of (minimum) job length.

This paper describes some of the methodological problems encountered with the change-over from the NACE Rev. 1.1 to the NACE Rev. 2 in business statistics. Different sampling and estimation strategies are proposed to produce reliable figures for the domains under both classifications simultaneously. Furthermore several methods are described that can be used to reconstruct time series for the domains under the NACE Rev. 2.

In dit rapport wordt antwoord gegeven op de vraag wat de kwaliteit van het OQM model is. Het OQM model is ontwikkeld door het CBS en ook geïmplementeerd binnen het CBS. Het is een model voor het managen van kwaliteit. De kwaliteit van het model was tot op heden nog niet getoetst.

Two univariate outlier detection methods are introduced. In both methods, the distribution of the bulk of observed data is approximated by regression of the observed values on their estimated QQ plot positions using a model cummulative distribution function.

This paper presents the methods used for compiling balance sheets for consumer durables in the Netherlands. The Perpetual Inventory Method is used to convert time series of consumption of consumer durables to wealth stocks. Consumer durables are a memorandum item in the non-financial balance sheets.

Statistical agencies have to ensure that respondents’ private information cannot be revealed from the tables they release. A well-known protection method is cell suppression, where values that provide too much information are left out from the table to be published. In a first step, sensitive cell values are suppressed. This is called primary suppression. In a second step, other values are suppressed as well to exclude that primarily suppressed values can be re-calculated from the values published in the table. This second step is called secondary cell suppression

This paper describes some new developments in survey methodology that may help to solve problems of survey taking in official statistics. The R-indicator is described as an additional indicator for survey quality. Web surveys are considered as a cheaper means of data collection, either as a single-mode survey or as one of the modes in a mixed-mode survey. Also attention is paid to more flexible ways of conducting the fieldwork of a survey. The R-indicator could play a role in this.

Macro-integration is the process of combining data from several sources at an aggregate level. We review a Bayesian approach to macro-integration with special emphasis on the inclusion of inequality constraints. In particular, an approximate method of dealing with inequality constraints within the linear macro-integration framework is proposed. This method is based on a normal approximation to the truncated multivariate normal distribution. The framework is then applied to the integration of international trade statistics and transport statistics. By combining these data sources, transit flows can be derived as differences between specific transport and trade flows. Two methods of imposing the inequality restrictions that transit flows must be non-negative are compared. Moreover, the figures are improved by imposing the equality constraints that aggregates of incoming and outgoing transit flows must be equal.

In this paper, we consider the situation where data are collected from both registers and sample surveys. We show how the bootstrap resampling method can be applied in this situation, in order to obtain insight in the accuracy of statistics based on combined data. The method is applied to the Dutch Educational Attainment File.

In 2008, there was a transition in the survey that measures perceived and actual safety in the Netherlands. In this paper by Kraan, Van den Brakel, Buelens, and Huys, it is attempted to explain some significant discontinuities caused by this transition. The observed discontinuities can be explained with the introduction of new data collection modes, Web survey and Self Completion Paper Questionnaire.

This paper by Floris van Ruth describes how the current stance of the business cycle can be derived from a mixed set containing a limited number of selected indicators. Using the Netherlands and the United States as examples, it is shown that it is possible to extract the business cycle from a mix of leading, coincident and lagging indicators, as long as the set is not skewed towards one particular type of indicator. Different methods, both direct and indirect, of deriving the common business cycle are used to test the robustness of the results. The similarity between the common cycles resulting from the different methods is seen as a confirmation of the validity of the common cycle interpretation of the business cycle.

Benchmarking is the process to achieve mathematical consistency between low-frequency (e.g. annual) and high-frequency (e.g. quarterly) data. Statistics Netherlands is going to apply a new benchmarking method on Dutch national accounts. The new benchmarking method is based on a multivariate Denton method, presented in Bikker and Buijtenhek (2006). In order to incorporate all economic relations into this model, we extended it with new methodological features, such as ratio constraints, soft constraints and inequality constraints. In this paper the new extended multivariate Denton method is presented.

Data collected for the production of structural business statistics consist of a large number of numerical variables, with many mathematical relations between them. These relations are specified in the form of consistency checks, called edit rules or edits. Edits are used to detect errors that occur in the data set.

In this paper the authors analyze the effect of offshoring on productivity (growth) while explicitly accounting for the presence of imperfect competition in the output market. The analysis suggests that offshoring has a positive impact on industrial TFP growth while it compresses competition. This latter result can be explained by the fact that, given output prices, lower production costs lead to higher price cost margins.

The present paper outlines the current modernisation programme of household surveys carried out by Statistics Netherlands. It focuses on its objectives in terms of data collection and standardisation of processes, efficiency targets and relevance and quality of outputs. The short and medium term goals and the European context are discussed and the experiences gathered so far are presented. This redesign project of household surveys is part of a large-scale modernisation programme of statistics production at Statistics Netherlands.

Statistics like the national accounts are compiled using various data sources. Most statistical institutes compile the underlying data using so called stovepipes. A stovepipe means that a statistic is produced as a “stand alone process”. Stovepipes, however, are inappropriate for producing integrated and coherent information and cause administrative burden. A solution is to integrate all stages of the production process as part of a chain of statistical products, using chain management. So far, chain management is hardly applied in statistical institutes. The present chapter describes the main components of chain management and how it is organized. It turns out that chain management is not easy to implement; it must be introduced step by step, and it is not something that once “organised” will continue to work ever after.

Statistics Netherlands is increasingly making use of administrative and other secondary data sources for the production of statistics. This approach makes Statistics Netherlands highly dependent on the quality of those sources. It is therefore of vital importance that a procedure is available to determine the quality of such data sources in a systematic, objective, and standardized way. For this purpose a quality framework and a checklist have been developed.

The response to a survey will never be 100%, nor will it be completely non-selective. This causes a bias in statistics based on survey research. In this PhD-thesis written by Fannie Cobben various methods for the treatment of nonresponse in sample surveys are described.

The analysis in this paper shows that certain groups are underrepresented in the LISS panel. Especially single person households, households with a high average age and households with first generation immigrants. We also show that offering a free computer and internet access has a positive effect on the representativity.

An important quality aspect of official statistics produced by national statistical institutes is comparability over time. To maintain uninterrupted time series, surveys conducted by national statistical institutes are often kept unchanged as long as possible. To improve the quality or efficiency of a survey process, however, it remains inevitable to adjust methods or redesign this process from time to time. Adjustments in the survey process generally affect survey characteristics such as response bias and therefore have a systematic effect on the parameter estimates of a sample survey. Therefore it is important that the effects of a survey redesign on the estimated series are explained and quantified. In this paper a structural time series model is applied to estimate discontinuities in series of the Dutch survey on social participation and environmental consciousness due to a redesign of the underlying survey process.

This paper gives introduction to the OQM model, discusses its characteristics and advantages, and describes the applications of the model up to now.

Statistics Netherlands developed a factsheet containing several indicators measuring conjunctural change. This factsheet can be divided into three parts. The first part is based on 15 macro-economic indicators. The second part consists of the export condition monitor (ECM). The third part consists of 10 rapid indicators. Until now this research was mainly focussed on collecting existing statistical data and putting it in a general context. When collecting data some methodological issues raised. These issues are discussed in the last part of this paper and are challenges for further research.

At Statistics Netherlands the development of productivity statistics is addressed as a key field of interest. The recent national accounts revision at Statistics Netherlands was taken as an opportunity to improve capital stock and depreciation statistics. The Perpetual Inventory Method, as now applied at Statistics Netherlands, provides in a consistent way statistics on depreciation, the net capital stock, the productive capital stock and capital services. Further, much attention has been given to estimating average service lives and discard patterns of different asset types based on direct capital stock observations in the manufacturing industry.

Raw data records often contain missing or inconsistent items. Estimatingthe value of those items is usually complicated by restrictions imposed on possible value combinations. In this paper a method is described with which missing values can be estimated for categorical data, while restrictions on value combinations are obeyed. It is also possible to draw random values. The method is based on performing random walks on the set of all possible datafiles of records which obey the restrictions.

The Dutch Labour Force Survey (LFS) is a rotating panel survey, of wich the first wave is CAPI and all the other waves are CATI. LFS statistics based on the first CAPI wave, especially employment status, show small but systematic differences with statistics based on the CATI waves.

This paper describes the ‘ideal’ to-be situation of the statistical production process of Statistics Netherlands from a business perspective. Notions as ‘steady state’, ‘unit base’ and ‘knowledge rule’ are introduced and explained. This ‘ideal’ to-be situation should serve as an architectural framework for future redesigns. The paper also touches upon the latest experiences of SN to achieve this ‘ideal’ situation.

This paper presents the methods used for compiling physical and monetary balance sheets of oil and gas reserves in the Netherlands. Results and sensitivity analyses are shown for the period between 1990 and 2005. The results are used for measuring multi-factor productivity and are published in the Dutch environmental accounts. Furthermore, balance sheets of oil and gas reserves are part of complete balance sheets for non-financial assets that are currently being developed.

This paper is a collaboration of Michael Polder and George van Leeuwen (CBS) with Pierre Mohnen and Wladimir Raymond (UNU-MERIT and University Maastricht). It investigates the effect of ICT and R&D on innovation, and the effect of innovation on firms' productivity, distinguishing various innovation types (product, process and organisational).

Counting on Statistics, the modernization program of Statistics Netherlands has as aims improvement of quality, more use of administrative data, higher efficiency, and reduction of the number of ICT applications. It consists of 5 subprograms: business architecture, methodological framework, generic processes, modernization of the main economic statistics, and modernization of other statistics

This document surveys the desired set of tools based on 1) acknowledged business functions in the production of a statistic, 2) the current set of tools and 3) general criteria that are considered relevant for the selection of tools.

The estimation of changes in voting behaviour can be pursued in two ways, by modelling aggregated election results, and by means of recall data recorded in survey questionnaires. In an application of these methods for the Dutch national elections of 2003 and 2006, we show that the estimated voting transition estimated by survey techniques and model based techniques complement each other, improve the validity of the results, and provide a basis for new research.

In this paper a multivariate structural time series model is described that accounts for the panel design of the Dutch Labour Force Survey and is applied to estimate monthly unemployment rates. Compared to the generalized regression estimator, this approach results in a substantial increase of the accuracy due to a reduction of the standard error and the explicit modelling of the bias between the subsequent waves.

This paper quantifies the effects of a changing euro-dollar exchange rate on Dutch imports and exports of goods during the period 1978-2007. We find that these effects did not change significantly over time. The paper also considers the effects of a changing exchange rate on re-exports and exports of Dutch products separately.

Dit rapport geeft een overzicht van nieuwe secundaire databronnen die voor het CBS mogelijk interessante gegevens bevatten. Voor een aantal van die bronnen is een statistische toepassing denkbaar. Ook wordt een aantal nieuwe toepassingen besproken. Het belangrijkste doel van dit rapport is een inspirerende en verfrissende kijk te geven op het gebruik van secundaire bronnen door het CBS.

This paper by Bert M. Balk considers the relation between (total factor) productivity measures for individual production units and those for aggregates. It avoids the making of all kinds of (neoclassical) structural and behavioural assumptions. In addition, dynamic ensembles of production units are treated, characterized by entry and exit."

There are still many puzzles to be solved concerning the relation between innovation and firm performance, in particular concerning the distinct roles of information and communication technology (ICT) and Research and Development (R&D) in creating new or improved products or production processes. This thesis provides evidence that both instances of innovation are important drivers of productivity and, thus, economic growth. This book consolidates the results of empirical work covering about 10 years of research aimed at understanding the importance for firm performance of innovation in a broad sense.

Dit rapport beschrijft negentien eigenschappen van statistische output. Elke eigenschap – ook wel aspect genoemd – is uitgewerkt volgens een bepaalde structuur te beginnen met de definitie van de eigenschap. Voor elke eigenschap zijn er mogelijke indicatoren en maatregelen benoemd en samengevat in een checklist in een bijlage. Dit rapport heeft verscheidene doelen. Zeven doelen zijn geïdentificeerd zoals het bieden van ondersteuning bij het maken van afspraken met afnemers over de kwaliteit van de statistische output. Het rapport bevat geen richtlijnen voor het CBS. Het heeft geen verplichtend karakter. Het rapport kan echter wel kan dienen als basis voor het ontwikkeling van dergelijke richtlijnen.

This report describes the results of the project Quality of Statistics Relevant to Statistics Netherlands’Corporate Image (KIS) as reported to Eurostat. In the project KIS three tools are developed that can be used to manage the quality of statistical output: a checlist, a questionnaire quick scan and a questionnaire deep scan.

This paper investigates competition and its relation to productivity growth. The results indicate that firms need time to adjust to changes in competition, raising productivity afterwards. Results are consistent over micro and macro, various indicators and industries.

Standardization of financial reporting is desirable to reduce the administrative burden on companies. To achieve this, the eXtensible Business Reporting Language (XBRL) has been developed. Almost 60% of the administrative burden on companies by Statistics Netherlands is caused by surveys about product trade between EU countries. Trade data cannot be filed in XBRL yet, because of the complexity of the Combined Nomenclature, the EU classification of traded products. Here, we describe a prototype of a dimensional XBRL taxonomy that enables reporting of trade concepts specified by product, using the recent XBRL Dimensions 1.0 specification.

This paper describes the business and information architecture of Statistics Netherlands as a process chain of activities. These activities are divided into four business areas, namely policy, design, management, and implementation. Each business area is discussed at length.

This paper describes the context architecture of Statistics Netherland (SN). Among others, the paper discusses the clients, consumers, in- and outputs, financers, and several reasons for change. Special attention is given to the Electronic Government.

A common problem faced by statistical offices is that data may be missing from collected data sets. The typical way to overcome this problem is to impute the missing data. The problem of imputing missing data is complicated by the fact that statistical data often have to satisfy certain edit rules and that values of variables sometimes have to sum up to known totals. Standard imputation methods for numerical data as described in the literature generally do not take such edit rules and totals into account. In the present paper we describe algorithms for imputation of missing numerical data that do take edit restrictions into account and that ensure that sums are calibrated to known totals.

Economic indicators such as fixed capital formation do not develop in isolation. The basic idea underlying the conditions monitors is that developments in macro-economic indicators are driven by a relatively low number of fundamental factors. By finding indicators representing developments in these underlying factors, the observed values of the target economic variable can be placed in context and analysed more thoroughly. The underlying factors identified for the private fixed capital formation were availability and cost of capital, capacity utilization and business conditions. The usefulness of the indicator set is greatly enhanced by presenting it in an appropriate graphic visualisation.

This paper gives a formula for the limiting error of the central limit theorem for the bivariate case. Insight this type of error simplifes the proofs of central limit theorems in probability sampling from finite populations.

Timely information on the direction of labour market developments is very important. A central indicator are vacancies, which are most sensitive to business cycle developments and lead the other labour market indicators. Using indicators from the various business surveys, monthly composite sentiment indicators, termed the employers’ sentiment, are constructed. These give a fast and reliable indication of the direction of vacancy development. This is due to the fact that vacancy development is intimately connected to developments in business conditions, as reflected in business survey indicators.

This research programme focuses on methodology for the statistical process between input (data gathering and storage) and output (estimation, analysis and publication). This “troughput process”consist of procedures for the detection of errors, correction of errors and imputation (filling in estimates) of missing values. These throughput procedures are also referred to as editing and imputation (E&I) procedures. The goal of our throughput research programme is to improve the E&I procedures both in cost-effectiveness and in quality.

This paper presents the methods used for compiling complete balance sheets for inventories. Furthermore, the underlying assumptions are explained and results of sensitivity analyses are shown. The balance sheets of inventories are part of complete balance sheets for non-financial assets that are currently being developed.

More frequent and timely information on the development of the employed and unemployed labour force can be useful. This can be provided by a composite sentiment indicator, the workers’ sentiment, constructed from outcomes of the Consumer Survey. The high correlation found between consumer survey indicators and labour market developments can be explained by the great importance of job prospects for the average consumer. The constructed composite sentiment indicator is very successful in tracking developments in the direction employed and unemployed labour force

Economic indicators such as exports do not develop in isolation. The basic idea underlying the conditions monitors is that developments in macro-economic indicators are driven by a relatively low number of fundamental factors. By finding indicators representing developments in these underlying factors, the observed values of the target economic variable can be placed in context and analysed more thoroughly. The underlying factors identified for Dutch exports were mainly the developments in the major export markets, Germany and the Eurozone and changes in competitiveness. The usefulness of the indicator set is greatly enhanced by presenting it in an appropriate graphic visualisation.

This survey discusses the basics of productivity measurement and shows that one can dispense with most if not all of the usual, neoclassical assumptions, on which models for productivity measurement are often based. The measurement model is applicable to individual establishments as well as aggregates such as industries or economies.

Economic indicators such as consumption do not develop in isolation. The basic idea underlying the conditions monitors is that developments in macro-economic indicators are driven by a relatively low number of fundamental factors. By finding indicators representing developments in these underlying factors, the observed values of the target economic variable can be placed in context and analysed more thoroughly. The underlying factors identified for household consumption were labour market developments, (expected) income development and developments in asset values. The usefulness of the indicator set is greatly enhanced by presenting it in an appropriate graphic visualisation.

This paper examines the efficiency of the Horvitz-Thompson estimator from a systematic probability proportional to size sample drawn from a randomly ordered list. Moreover, the efficiency is compared with that of an ordinary ratio estimator. The results are demonstrated by means of a simulation study with Dutch data on the Producer Price Index. The discussion on the efficiency includes a comparison with rejected Poisson sampling.

This paper is about the history of survey sampling. It describes how sampling became an accepted scientific method. From the first ideas in 1895 it took some 50 years before the principles of probability sampling were widely accepted. This papers has a focus on developments in official statistics in The Netherlands, but it also pays attention use of sampling techniques in market research.

In all European Union countries the classification of economic activities that is used in the Business Surveys is currently rebased from NACE Rev 1.1 to NACE Rev. 2. This paper describes some of the methodological problems encountered with the changeover to the NACE Rev. 2. Different sampling and estimation strategies are proposed to produce reliable figures for the domains under both classifications simultaneously. Furthermore several methods are described that can be used to reconstruct time series for the domains under the NACE Rev. 2.

Existing proofs of central limit theorems in random sampling from finite populations are quite lengthy. The present paper shows how the proof for this kind of theorem in random sampling can be simplifed by using the cntral limit theorem for independent, random vectors.

This paper by Ralph Foorthuis and Sjaak Brinkkemper aims to identify best practices for carrying out business and systems analysis in projects that are required to comply with an Enterprise Architecture. The research methods used are Canonical Action Research and Focus Group interviews

This article by Ralph Foorthuis, Sjaak Brinkkemper and Rik Bos presents a model for projects that have to adhere to an Enterprise Architecture (EA) in order for their results to be aligned with the broader organization. The model features project artifacts (i.e. deliverables such as Software Architecture Documents), their mutual relationships, their relationship with EA, and the processes in which they are created and tested on conformance.

The subject of this thesis is difficult groups in survey research. Moeilijk waarneembare groepen in enquêteonderzoek is het onderwerp van deze dissertatie

To maintain uninterrupted time series, surveys conducted by national statistical institutes are often kept unchanged as long as possible. To improve the quality or efficiency of a survey process, however, it remains inevitable to adjust methods or redesign this process from time to time. Such adjustments generally have a systematic effect on the parameter estimates of a sample survey. Therefore it is important that the effects of a survey redesign on the estimated series are explained and quantified. In this paper a structural time series model is applied to estimate discontinuities in series of the Dutch survey on social participation and environmental consciousness due to a redesign of the underlying survey process.

This contribution discusses possible scenarios and methodologies for the national statistical agencies for backcasting the new classification scheme (NACE Rev. 2.0) in existing time series of business statistics. We provide a discussion of the basic principles of reconstructing time series in general, after which the application of these methods in the area of short term business statistics is handled and illustrated with an example. We conclude that it is possible to obtain reasonable approximations of historic time series using rather simple methodology, but the quality of the backcasted time series is hampered by heterogeneity of classes.

This paper gives an overview of statistical data editing. The paper first describes the traditional interactive approach to data editing. It then focuses on modern editing techniques, such as selective editing, automatic editing, and macro-editing. The paper aims to provide an introduction to these topics, and gives many references to the literature.

Dit rapport beschrijft hoe kwaliteit in een organisatie kan worden gemanaged. Volgens het model wordt eerst een keuze gemaakt uit de relevantie aandachtsgebieden. Binnen elk aandachtsgebied dienen vervolgens de juiste beheersingsmaatregelen te worden getroffen. Het model kan worden toegepast binnen elk organisatie en op elk gewenst gebied.

This papers analyses which determinants are relevant for catching up to the global and national technological frontier. We focus on innovation, human capital, technology transfers and competition as sources of productivity growth. Our approach integrates the literature on the two faces of R&D, convergence and firm-level heterogeneity in productivity.

This paper explores the broader range of intangible investment in the Netherlands. Both conceptual and measurement issues are discussed. Furthermore, intangibles are capitalized and their contribution to economic growth by industry is examined.

Due to methodological problems, the quality of the outcomes of web surveys may be seriously affected. This paper addresses one of these problems: self-selection of respondents. Self-selection leads to a lack of representativity and thus to biased estimates. It is shown that the bias of estimators in self-selection surveys can be much larger than in surveys based on traditional probability samples. It is explored whether some correction techniques (adjustment weighting and use of reference surveys) can improve the quality of the outcomes. It turns out that there is no guarantee for success.

This paper describes methods for automatically detecting and correcting two systematic errors that are found in data collected for structural business statistics. It also discusses a simple heuristic method for resolving rounding errors when the data have to satisfy many balance edit rules.

In 2006 the selection of respondents for the Time Use Survey (‘Tijdbestedingsonderzoek’ or ‘TBO’) consisted of three stages. In this paper we employ R-indicators to investigate the representativity of the TBO response. R-indicators are proposed by Schouten and Cobben (2007) as measures to evaluate the similarity between the response and population of interest. This research was sponsored by the Netherlands Institute for Social Research (SCP). The 2005 and 2006 datasets were provided by DANS.

The system of environmental and economic accounting (SEEA) provides a range if important accounting aggregates which can logically be defined within the SEEA’s accounting identities. One key recommendation made in this paper is that these main aggregates should be explicitly pointed out as potential indicators in Part I of the revised SEEA. In this paper an overview is given of key aggregates from the physical flow accounts of which it is recommended that they should be explicitly exposed in the SEEA text of Part I.

The Ph.D. thesis by Ton de Waal examines two different areas: detection of erroneous data and protection of sensitive data.By assuming that as few errors as possible have been made while answering and processing the questionnaires, the detection of erroneous data can be formulated as a mathematical optimisation problem. In the thesis a number of methods are developed to solve this optimisation problem efficiently. Sensitive data of individual respondents or small groups of respondents have to be protected, for instance by not publishing these data. In the thesis several mathematical problems with respect to protecting sensitive data are explored. For a number of problems, such as calculating the information loss due to protecting sensitive data and minimising the information that is not published, solutions are presented.

This paper presents methods and results with regard to the estimation of service lives and discard patterns, based on directly observed capital stock data and on discard surveys in the manufacturing industry. The presented results are input for the calculation of consumption of fixed capital and net capital stock for the national accounts.

The system of national accounts shows a consistent quantitative overview of the economic process of a country. Consistency however is not a gurantee for good quality. In this paper an overview is given of possible adjustments and extensions of the system. The paper refers to the national accounts practice in the Netherlands.

In this study, a different temporal disaggregation approach is tested for producing a monthly series from a quarterly statistic when monthly observations are lacking. Monthly realisations are obtained by interpolating the quarterly series using related monthly indicators and an autoregressive component. For optimal results, the whole is cast in the state space form. Good monthly indicators can thus be easily obtained, and real time analyses shows that the methods work in practice as well.

Econometric techniques can be a powerful aid for the production of statistics. A prime example is the use of so-called benchmarking or interpolation techniques for producing more detailed or frequent statistics when complete data are lacking. Here, a version is tested which aims to produce a monthly indicator of fixed capital formation from the quarterly national accounts series. In this version, three latent monthly indicators are defined for each quarter, and derived from available related monthly indicators. A credible monthly series can be constructed, but unfortunately real time performance is shown to be unsatisfactory.

In a previous paper (discussion paper 07002, Schouten and Cobben) we introduced so-called R-indicators. These indicators attempt to measure the representativeness of the response to a survey. In the present paper we apply these indicators to several examples at CBS. These examples include surveys with different interview modes, data collection strategies and amounts of pre-paid incentives. We compare the values of the R-indicators to the outcomes of thorough analyses performed on these data sets.

In this paper we study the selectivity in the recruitment of respondents for one of CentERdata’s Internet panels (the CentERpanel). This recruitment is based on a probability sample. It involves three stages: participation to a first telephone interview, willing to be re-contacted and final agreement to participate in the Internet panel. We distinguish selectivity with regard to age and income in all stages and with pc-ownership in the latter two stages only.

Design-based and model-assisted estimation procedures are widely applied by most of the European national statistical institutes. There are, however, situations were model-based approaches can have additional value in the production of official statistics, e.g. to deal with small sample sizes, measurement errors and discontinuities due to survey redesigns. In this paper several cases are identified where design-based estimators do not result in sufficiently reliable estimates and model-based procedures are more appropriate for producing official releases.

Dit paper van Ralph Foorthuis en Sjaak Brinkkemper beschrijft verscheidene architecturen op project niveau, indien geconformeerd wordt aan Enterprise Architecture. Onder de beschreven architecturen bevinden zich Project Architecture, Project Start Architecture en Software Architecture. Zij worden geplaatst in de context van Enterprise Architecture en Domain Architecture.

In this paper a multivariate structural time series model is described that accounts for the panel design of the Dutch Labour Force Survey and is applied to estimate monthly unemployment rates.

For several of Statistics Netherlands' establishment surveys, respondents havethe choice to respond on paper or electronically. This paper investigates theconsequences for editing processes when two different data streams have to behandled. Special attention is payed to the question of how to quantitativelycompare the quality of the data streams.

Globalization affects all aspects of economic and social life. In order to study the effects of an open economy on employment and welfare, combined microdata from business surveys, social surveys and administrative registers are required to make causal inferences.

This paper comprises a first attempt to provide a comprehensive measure on spending on intangible capital in the Netherlands. We replicate the approach pioneered by Corrado, Hulten and Sichel (2004, 2005 and 2006) for the U.S

Labour productivity in business-services industry tends to lag behind the rest of the economy. This paper investigates whether labour productivity in European business services is affected by unexploited economies of scale. Moreover, it analyses whether the incidence of scale sub-optimality is related to characteristics of the market or to national regulation characteristics.

A common problem faced by statistical offices is that data may be missing from collected data sets. The typical way to overcome this problem is to impute the missing data. In this paper we describe two algorithms for imputation of missing numerical data that take into account edit restrictions .

This paper was presented at the QUEST (Questionnaire Evaluation Standards) Workshop, 21-23 October, ZUMA Mannheim, Germany. A Questionnaire lab test will be discussed concerning new questions on respondents’ education, occupation and company as well as automatic coding of the answers.

This paper describes the evaluation and redesign of the Structural Business Survey questionnaire. We describe how and to what extent various evaluation methods contributed to our understanding of the main problems of the SBS questionnaires and how some of the evaluation results could be translated straightforwardly into solutions.

In this paper, design-based analysis of embedded experiments is generalized to experimental designs in which clusters of sampling units are randomized over the different treatments. Furthermore, test statistics are derived to test hypotheses about ratios of two sample estimates. The methods are illustrated with a simulation study.

From July to December 2005 a large scale follow up of nonrespondents in the Dutch Labour Force Survey (LFS) was conducted at Statistics Netherlands. In the study a sample of nonrespondents in the LFS was approached once more with strongly condensed CATI, web and paper questionnaires containing only the key questions of the LFS; the basic-question approach.

In this study a dataset containing business survey variables and actual turnover realizations at the firm level is used to study what influences firms’ production expectations, what causes these to change and to test a number of hypothesis on expectation formation.

This study is based on a dataset containing business survey variables and actual turnover realizations at the firm level, which are used to study the link between production expectations and turnover realizations. The aim was to formally test the connection at the micro-level between expectations and realizations, and to test whether expectations are connected more to single month/short-term turnover developments, or more to medium-term developments.

Surveys are often kept unchanged as long as possible. When a change is proposed, it is important to minimise the impact so as to minimise the inconvenience for users. This paper sets out the steps in an orderly transition, provides practical guidance on how to minimise discontinuities, and reviews methods for dealing with discontinuities if they arise.

From July to December 2005 a large scale follow up of nonrespondents in the Dutch Labour Force Survey (LFS) was conducted at Statistics Netherlands. In the study a sample of non-respondents in the LFS was approached once more by a small number of selected interviewers. The sample consisted of LFS households that refused, were not processed or were not contacted in the LFS of the months July–October.

From July to October 2005 Statistics Netherlands conducted a pilot with basic questions for the Labour Force Survey (LFS). The basic questions consisted of all questions that are necessary to derive the employment status. A change of questionnaire and interview mode often results in measurement errors. This paper investigates the impact of mode effects in the LFS as a result of using the basic question approach.

Many survey organisations focus on the response rate as being the quality indicator for the impact of non-response bias. However, it is not true in general that higher response rates imply smaller non-response bias. We introduce a number of concepts and indicators to assess the similarity between the response and sample of a survey. Such quality indicators may serve as counterparts to survey response rates and are primarily directed at evaluating the non-response composition.

Het uitvoeren van een goed survey-onderzoek is ingewikkeld, kostbaar en tijdrovend. De ontwikkelingen in de informatietechnologie in de jaren tachtig van de vorige eeuw maakten het mogelijk om de computer te gaan gebruiken voor het verzamelen van gegevens. Met de razendsnelle opkomst van het internet heeft een nieuw type gegevens­verzameling zijn intrede gedaan: Computer Assisted Web Interviewing (CAWI). Hierbij wordt de vragenlijst aangeboden aan de respondenten via het internet. Dit soort surveys wordt kortweg aangeduid als websurveys.

At first sight, web surveys seem to be an interesting and attractive means of data collection. However, there is another side to this coin. Due to methodological problems, outcomes of web surveys may be severally biased. This paper describes some of the methodological problems, and explores the effect of various correction techniques.

Labour productivity of health services has received a lot of attention in the Netherlands in recent years. Labour productivity requires the measurement of output and labour input. This paper describes the output volume index developed at Statistics Netherlands for hospital health services and some of the problems and possibilities that are encountered in the measurement of output.

This report provides an overview of the results of the XBRL 2005 project that was conducted at Statistics Netherlands in cooperation with Atos Origin. The XBRL 2005 project studied the various ways that data checks of Statistics Netherlands could be applied to instances in the XBRL format. The Structural Business Statistics questionnaire for the wholesale was used as an example.

An inventory was made of methods used in longitudinal analysis by National Statistical Institutes, together with selected longitudinal data sets of NSI's. The methodology was mostly not an explicit of these studies. The UK, the USA, Canada, and New Zealand have a long tradition of longitudinal surveys. Sweden and Denmark have longitudinal possibilities based on registers. Most studies concern persons, rather than enterprises. Statistics Netherlands has five longitudinal surveys and several longitudinal registers.

This paper relates unemployment outcomes from the Dutch labour force survey to the number of individuals registered at Dutch employment offices. Both series evolve rather similar, despite the substantial differences in underlying populations. The empirical relationship is quantified with a state space model. This model is subsequently used to generate forecasts for the unemployed labour force. By comparing the forecasted values with survey outcomes, outliers can be identified.

In this paper we present how we can measure directly continuous creation and destruction flows for emplyee jobsfrom the Social Statistical Database (SSD), and how the flows are related with the quantity of jobs and average number of jobs in a year. Furthermore we suggest some indicators for the study of continuous job flow dynamics between years and subpopulations.

Using a classification method developed in this paper, the quality of qualitative survey data of the manufacturing industry at micro-economic level is investigated. For single companies, recent opinions on recent production developments are compared to quantitative results of industrial turnover. The results show that 57.6% of the analyzed companies give useful qualitative answers for calculating meaningful balance statistics such as producers’ confidence. The level of agreement between quantitative and qualitative data for companies with seasonal patterns in turnover on average is 10.6%-points higher than for companies without seasonal patterns.

This paper investigates the noncoverage bias in CATI-surveys. This bias is introduced by the restriction of using the telephone as a communication medium. Obviously, only that part of the population that disposes of a listed, fixed-line telephone can be interviewed by means of CATI.. Data from a CAPI-survey, the Integrated Survey of Living Conditions, is used to assess the noncoverage bias when restricting the sample to individuals that own a listed, fixed-line telephone.Individuals with and without a listed, fixed-line number are compared. Two methods to adjust for the noncoverage bias are applied to the data from the Integrated Survey of Living Conditions: linear weighting and propensity score stratification. Finally, a strategy is explored to simultaneously adjust for nonresponse and noncoverage bias.

According to the System of National Accounts (SNA 1993) illegal activities should be registered in the national accounts, as the accounts should cover all economic activities, including those deliberately concealed from the authorities. Illegal production concerns activities which generate goods and services which themselves are forbidden by law, or activities that are illegal when performed by unauthorised persons. In this paper, illegal activities, as for example drugs related activities, in the Netherlands are described and an estimate of their contribution to the Dutch economy in 2001 is presented. The estimates in this report are not based on official statistics.

In this paper, the stace space approach is used to perform temporal disaggregation. The investigation is performed in two steps. In the first step, the state space approach is tested on two study cases in order to investigate the accuracy of the temporal disaggregation. In the second step, the method is applied to the retail sales statistics in order to obtain estimates for the monthly turnover of the retail sector. The paper concludes that the combination of the state space approach and Denton’s adjustment method of monthly or quarterly series to annual totals provides a powerful tool to disaggregate annual data into higher frequency series.

Now that the Internet is expanding so rapidly, it has become an attractive medium for collecting relatively large amounts of data in a relatively cheap way. Not surprisingly, national statistical institutes, research institutes, and commercial marketing research organisations are using, or are considering using, Internet surveys for collecting survey data. However, use of Internet surveys is not without draw-backs. Lack of attention for the methodological aspects may easily lead to survey designs that will produce invalid survey results. And there are ample examples of such surveys. This paper gives a description of some of the methodological problems of Internet surveys. Then it discusses the role web surveys can play in national statistical institutes. In the short run, Statistics Netherlands foresees interesting applications in mixed-mode surveys Attention is paid to some problems concerning implementation of mixed-mode surveys. The Blaise system, developed by Statistics Netherlands, is described as a useful tool for mixed-mode surveys. Some early experiments and experiences with Internet surveys are mentioned. The final part discusses the prospects for single-mode Internet survey in the not too far away future.

Nonresponse in household surveys can be a threat to the quality of statistics. Research shows that often the response to these surveys is selective with respect to demographic characteristics like age and household composition. For this reason estimators are usually adjusted to account for nonresponse.Nonresponse adjustment methods make use of covariates that are available for both respondents and non-respondents. A problem is the selection of covariates that relate both to the key survey questions and to the response behaviour. Therefore, often the process of selection is performed in two steps.We present a classification tree method that allows for the construction of weighting strata that simultaneously account for the relation between response behaviour, survey questions and covariates. We apply the classification trees to survey data of Statistics Netherlands.

The objective of the research described in this paper is to improve the quality of the statistics of the turnover growth of the manufacturing industry. The quality parameters studied are timeliness and accuracy. Without adaptations of the compilation process of the statistics, improvement of one parameter will usually lead to a deterioration of the other. Three approaches are explored. The first approach considers alternative imputation methods for non-response. Methods can be found which improve timeliness with one week without loss of accuracy. The second approach is selective response chasing of the largest enterprises, which also seems quite promising. The third approach combines early response with an expected value or "nowcast". It improves the accuracy of the estimation of the turnover growth during the first 3 weeks after the reference month, however not enough for a timelier publication.

The method of repeated weighting aims at obtaining numerical consistency among tables estimated from different surveys. However, in its common form, it does not take into account the existing edit rules. Consequently, the repeated weighting estimates will generally be not in agreement with existing edit rules. This report describes how to deal with linear categorical and numerical edit rules within the framework of repeated weighting estimation. A step-by-step plan is proposed of an estimation procedure yielding numerically consistent tables in agreement with edit rules.

This paper identifies a broad concept of unused labour force. This concept can be related to the transitional labour market theory. It is next reviewed in the light of economic studies using a similar construct. Pro’s and con’s of our approach are discussed and the literature is used to assess characteristics of groups building this concept and to identify transition from these groups to (more) employment. A first assessment of the size and some of the main characteristics of such groups is made using the Dutch Labour Force Survey of Statistics Netherlands. As far as transitions are concerned, international studies imply flows of persons from such different groups into labour will almost unavoidably have to be identified using linked longitudinal surveys among individuals. These are currently under construction at Statistics Netherlands.

PRAM (Post Randomization Method) is a disclosure control method for microdata, introduced in 1997. However, it has not yet been applied extensively by statistical agencies. This is partly due to the fact that, even though some theoretical results exist, little practical knowledge is available on its effect on disclosure control as well as on the loss of information it induces. We will try to make up for this lack of knowledge, by supplying some empirical information on the behaviour of PRAM, with respect to disclosure control and loss of information.

This paper first summarizes the discussion around the construction of a household satellite account and then continues to present the extension of the Dutch National Accounts with a time-use module. The module is developed to investigate the potential of data on paid and unpaid labour measured in time-units to serve the purpose of analyses, while keeping the link with the SNA framework and bypassing the issue of the valuation of unpaid labour.

Contrary to mineral exploration, computer software development and literary or artistic work, Research and Development is in the present SNA-1993 not considered as an activity leading to the creation of intangible assets. It is expected that this will change in the course of the coming SNA update. This paper discusses a number of conceptual and practical issues concerning the representation of R&D expenditure in the national accounts, including its capitalisation.

Hedonic methods are a promising tool when calculating price indexes for products experiencing rapid technological change. In this paper several hedonic time dummy price indexes are calculated for televisions, refrigerators, washing machines and personal computers, based on scannerdata of the population of sales over the period 1999-2001. It appears that for televisions, refrigerators, washing machines a population based matched-model index is a good approximation of the so-called generalised Törnqvist index, that is used as benchmark index among the hedonic time dummy indexes. The paper analyses which conditions have to be fulfilled for a matched-model index to be a good approximation. The paper also shows the dynamic structure in consumer sales of some of these durables.

Nonresponse is a recurring problem in household surveys in many countries. Response rates of Statistics Netherlands surveys often vary between 50% and 60%. Research shows that nonresponse is usually selective. Respondents and nonrespondents differ at various demographic characteristics. To avoid a substantial negative impact on the quality of survey results, often weighting adjustment techniques are carried out. Statistics Netherlands has a large amount of background information available for this purpose. This information originates from registers and other administrative sources. The paper describes research aimed at finding auxiliary variables that are most important for including in weighting models. Also a technique is proposed to select the best weighting model. Theory is applied to data from a major survey.

This paper investigates the role of economic determinants in the dissolution of recently formed marital and non-marital unions. The economic determinants studied are women’s and men’s personal income and socio-economic positions, and the contribution of women’s income in the total household income. The discrete-time event history analyses cover the dissolution of cohabiting and married unions, using longitudinal data from Statistics Netherlands’ Income Panel Study.

The aim of this study was to improve the timeliness of the monthly statistics on sales development in the Dutch manufacturing industry. The focus was on producing timely indicators in the presence of missing data. Currently, missing data are imputed using, besides already available data for the month in question, known data from only one previous month, whereas we propose to use whole time series for that purpose. It was concluded that timeliness can be improved from 37 days to 27 days after the end of the month, without sacrificing accuracy.

In this paper we propose a new approach to impute data under linear restrictions. If the data are normally distributed we will use simulations from the standard normal distribution. If the data are not normally distributed we study the use of the Dirichlet distribution.

Estimates for population statistics can be seriously biased in case response rates are low and the response to a survey is selective. Methods like poststratification or propensity score weighting are often employed in order to adjust for bias due to nonresponse.

This paper analyses the nonresponse of the Integrated Survey on Living Conditions, a large continuous survey of Statistics Netherlands. For this survey, more auxiliary variables were available than in regular situations. These variables could be obtained by linking registers and databases to survey data files. Moreover, also a number of fieldwork variables were included in the analysis. By analysing the enriched survey data file, more information could be obtained about a possible under- or over-representation of certain groups in the survey.

A simulation study is carried out to investigate the performance of repeated weighting estimators and corresponding variance estimators. The study concerns two three-way frequency tables for a population of persons to be estimated from a sample matched to a register. Repeated weighting estimates are general regression estimates adjusted to certain marginal totals in order to achieve numerical consistency, in this case with register counts. For the particular frequency tables studied, repeated weighting and corresponding variance estimation perform well. Even for moderately small samples, mean squared errors are comparable to those for the general regression estimator that serves as starting point for the repeated weighting estimator. For the second frequency table simulated, a simplified variance estimator for the repeated weighting estimator is computed, which performs nearly as well as the original repeated weighting variance estimator.

Statistics Netherlands participated in the EUREDIT project, a large international research and development project on statistical data editing and imputation that lasted from March 2000 till February 2003. The main goals of this project were the development and evaluation of new and currently used methods for data editing and imputation. In this paper we describe the general approach applied by Statistics Netherlands on the two business surveys used in the EUREDIT project. We also describe the development of our edit and imputation strategy and give results supporting the choices we have made. Finally, we provide results of our approach on the two evaluation data sets, and compare these results to the results of the other institutes participating in EUREDIT.

In this paper we present a new algorithm for solving the error localisation problem for a mix of continuous and categorical data. This algorithm is based on constructing a binary search tree.

This paper deals with non-response bias, discussing a few approaches in this field. It is demonstrated that nonresponse bias as to voter turnout is lower in a survey on living conditions than in a purely political survey. In addition, auxiliary information from registrations is used to investigate non-response and its bias among ethnic groups. Response rates among ethnic minority groups are rather low, but there is no evidence that response rates are lower in lower social class areas. Unsurprisingly, correcting for limited socio-economic deviations.

Repeated weighting provides a method to obtain sets of table estimates with numerically consistent margins from combinations of registers and surveys. It is based on repeated application of the regression estimator and generates a new set of weights for each table which is estimated. Repeated weighting is implemented in the prototype software package VRD. This report describes the results of five simulations in which various aspects of repeated weighting were tested. The differences in accuracy between the repeated weighting- and the standard regression estimator were found to be small. When correctly implemented, repeated weighting consistently yielded a smaller standard deviation. In certain cases, a very limited increase in bias compared to standard weighting was found. The VRD estimator for the variance was found to be reliable only for cells of sufficient size and with a low enough variance of the weights.

Experiments embedded in ongoing sample surveys are particularly appropriate to test effects of alternative surevey methodologies on estimates of finite population parameters.

Statistics Netherlands is presently making a considerable effort in combining data from administrative sources with, mainly household, survey data. By making efficient use of register data, Statistics Netherlands intends to improve the accuracy of its statistical information, and, at the same time, to decrease the response burden on households. The resulting large micro dataset with combined data is called the 'Social Statistical Database' (SSD); estimates related to social statistics are obtained from this SSD. Preferably, these estimates should be numerically consistent, although they might be obtained from different sources. At Statistics Netherlands, a new estimation method has been developed which, under certain conditions, ensures numerically consistent table sets. This method is called 'repeated weighting', and is based on a repeated use of the regression estimator. In the present paper we describe this new estimation method.

A simulation study is performed to compare linearization and balanced repeated replication (BRR) variance estimates in various situations.

Over the last few years several algorithms for solving the so-called error localisation problem have been developed at Statistics Netherlands. For six data sets involving numerical data we present computational results for four of those algorithms in this paper. The algorithms are based on a standard mixed integer programming formulation, on the generation of the vertices of a certain polyhedron by means of an adapted version of Chernikova’s algorithm, on a branch-and-bound algorithm using Fourier-Motzkin elimination, and on a cutting plane approach.

Discussion paper A remote access system is a facility where users can submit queries for statistical information from their own computer. These queries are handled by the statistical agency and the generated, possibly confidentialised, output is returned to the user. This way the agency still keeps control over its own data while the user does not need to make frequent visits to the agency.For some years, the Luxembourg Income Study (LIS) and Luxembourg Employment Study (LES) have made use of an advanced remote access system. At Statistics Netherlands and at other statistical institutes recently the need for a similar system has been expressed. In this paper, we discuss the characteristics, limitations and desired properties of a remote access system. We illustrate the discussion by the system used at LIS/LES.

Statistics Netherlands is planning to set up a system of productivity statistics. An innovative approach is to directly build up productivity indices from data at the level of the individual firm. To study the feasibility of this approach, several exercises have been carried out, using micro-data on trade services, transport services and business services. It will be shown how sensitive productivity indices are with respect to the method. Some recommendations about the method will be given.

This paper is about the relation between productivity change at the micro level and the meso level. The paper considers the interplay between productivity change at the firm level and changes in the industrial structure which are caused by factors such as growth or decline and entry or exit of firms. The availability of firm-level data underlying officially publicized aggregate figures makes it possible to explore this area.