Matching

Matching datasets is a regular operation in the statistical process. The simplest form of matching is linking two datasets (or tables) on the basis of a database key: records from the two datasets (tables) match if the database key is exactly identical. For more complicated types of matching other variables, so-called secondary keys, are used, such as names and time variables. The problem with this more extensive type of matching is that the scores of secondary keys may contain errors, or that variables do not have exactly the same definition. This report gives a systematic overview of various matching problems, characterised by complicating factors to be taken into account and the methods available to deal with these.