Consistent estimates for categorical data based on a mix of administrative data sources and surveys

cover, Consistent estimates for categorical data based on a mix of administrative data sources and surveys, Laura Boeschoten
© CBS
Dissertation on multiple imputation of latent classes to simultaneously estimate and correct for misclassification and missing data in combined datasets.
National Statistical Institutes such as Statistics Netherlands often use large datasets to estimate population tables on many different aspects of society. A way to create these rich datasets as efficiently and cost effectively as possible is by utilizing already available population registries containing administrative data. When more information is required than already available, population registries can be supplemented with survey data. However, a major problem is that the scores of variables in both surveys and administrative data can be inconsistent and inaccurate because of various reasons, i.e. they contain misclassification.

To overcome the issue of misclassification in both kinds of sources, a method is developed in this dissertation which combines multiple imputation (MI) and latent class (LC) analysis (denoted as MILC). This method estimates the amount of misclassification and simultaneously imputes a new variable that is corrected for that misclassification. Furthermore, uncertainty due to misclassification is incorporated by using multiple imputations. Edit rules can be incorporated in the MILC method, which prevents impossible combinations of scores from occurring in the multiply imputed dataset.

This thesis has shown that multiple imputation of latent classes is a flexible solution to simultaneously estimate and correct for misclassification and missing data in combined datasets.

Boeschoten, L. (2019). Consistent estimates for categorical data based on a mix of administrative data sources and surveys. Dissertation, Tilburg University.