Statistical inference based on randomly generated auxiliary variables

In most real-life studies, auxiliary variables are available and are employed to explain and understand missing data patterns and to evaluate and control causal relations with variables of interest. Usually their availability is assumed to be a fact, even if the variables are measured without the objectives of the study in mind. As a result, inference with missing data and causal inference require a number of assumptions that cannot easily be validated or checked. In this paper, a framework is constructed in which auxiliary variables are treated as a selection, possibly random, from the universe of variables on a population. This framework provides conditions to make statistical inference beyond the traces of bias or effects found by the auxiliary variables themselves. The utility of the framework is demonstrated for the analysis and reduction of nonresponse in surveys. However, the framework may be more generally used to understand the strength of associations between variables. Important roles are played by the diversity and diffusion of the population of interest, features that are defined in the paper and the estimation of which is discussed.