Combining probability and nonprobability samples to estimate linear model coefficients
Conference
Regional Statistics Conference 2026
Format: CPS Abstract - Malta 2026
Keywords: bayesian approach, calibration, composite estimator, nonprobability sample, pseudo-weight
Session: CPS 31 Data Integration
Thursday 4 June 11 a.m. - noon (Europe/Malta)
Abstract
Authors: Char Hilgers and Sabine Zinn
Nonprobability sampling is increasingly preferred for surveys due to much lower costs and ease of implementation, compared to large probability samples. However, probability samples remain important because they generally provide more accurate estimates than nonprobability samples and provide benchmark information for nonprobability surveys. In recent years, response rates to probability surveys have been declining, making the large sample sizes for robust estimation increasingly costly to obtain. There is a growing body of research on the integration of probability and nonprobability samples to enjoy the strengths of both.
Several methods for combining probability and nonprobability samples have been developed. Blended calibration combines an unweighted nonprobability sample with a probability sample weighted by population values, then calibrates according to variables derived from the probability sample only.[1,2] A composite estimator from a linear combination of probability and non-probability samples with a bias function was found to reduce mean squared error compared to probability samples alone.[3] Pseudo-weighting calculates a pseudo-inclusion probability for nonprobability sample participants, making use of variables collected in both samples, then analyses samples jointly.[4] Lastly, the use of a larger nonprobability sample as an informative prior to estimate linear model coefficients on a smaller probability sample showed a reduction in variances and mean squared errors.[5]
We present a comprehensive comparison and assessment of the above methods for combining probability and nonprobability samples, to evaluate their efficacy in estimation of parameter values and reduction of variance in linear model coefficients, compared to the probability sample alone.
Specifically, we compare the informative prior, blended calibration, composite estimator, and pseudo-weighting approaches with an application to a real-data probability and nonprobability sample measuring beliefs in conspiracy myths in Germany. We compare maximum likelihood estimates of regression coefficients and 95 percent confidence intervals for each method, with outcome variables on conspiracy beliefs and covariates covering demographic, education, and employment information collected in both samples.
1. DiSogra C, Cobb C, Chan E, Dennis JM. Calibrating Non-Probability Internet Samples with Probability Samples Using Early Adopter Characteristics. In: Section on Survey Research Methods. 2011.
2. Fahimi, M., Barlas, F. M., Thomas, R. K. & Buttermore, N. Scientific Surveys Based on Incomplete Sampling Frames and High Rates of Nonresponse. Surv Pract 8, 1–11 (2015).
3. Elliott, M. & Haviland, A. Use of a web-based convenience sample to supplement a probability sample. Survey Methodology 33, 211–215 (2007).
4. Elliott, M. R. Combining Data from Probability and Non- Probability Samples Using Pseudo-Weights. Surv Pract 2, 1–7 (2009).
5. Wisniowski, A., Sakshaug, J. W., Perez Ruiz, D. A. & Blom, A. G. Integrating Probability and Nonprobability Samples for Survey Inference. Journal of Survey Statistics and Methodology 8, 120–147 (2020).