Download PDF

Compensating for selection error when combining estimates from non-survey data and a probability-based sample

Author

Ton De Waal

Co-author

Conference

65th ISI World Statistics Congress

Format: IPS Abstract - WSC 2025

Keywords: bias, non-probability sample

Session: IPS 678 - Data Integration for Producing Official Statistics: Where are We Headed?

Tuesday 7 October 2 p.m. - 3:40 p.m. (Europe/Amsterdam)

Abstract

National statistical institutes traditionally use probability samples to produce estimates for population parameters of interest. In probability sampling one draws units from the target population according to a sampling design for which the inclusion probability of each unit is known. This enables one to obtain unbiased estimates for population parameters. However, major drawbacks of probability samples are that it is time-consuming and expensive to collect the data, which also implies that such samples are often rather small and estimators based on them have a high sampling variance. Nowadays, a large variety of datasets that are not based on sampling designs and for which the inclusion probabilities are unknown provide a massive amount of data at a low cost within a short time. Such datasets are referred to as nonprobability samples. Examples are administrative data, opt-in online surveys and big data. The low cost and fast availability of nonprobability samples make them very attractive to be used for statistical purposes. However, they are often selective, and because their “sampling design” is unknown, estimators based on such samples are usually biased. In order to use nonprobability samples for producing estimates for population parameters, one generally has to correct for selectivity. This can be done by means of several kinds of approaches, such as combining estimates for a target variable based on a probability sample with estimates from a nonprobability sample, and pseudo-weight approaches, where one constructs weights for the nonprobability sample that are subsequently used to produce estimates. In this talk we will discuss several methods to correct for selection bias by combining a nonprobability sample with a probability sample. We will also focus on how iterative proportional fitting can play a beneficial role in some situations.