Penalized spline model-based composite estimator of proportion in a finite population using probability and non-probability sample
Conference
65th ISI World Statistics Congress
Format: CPS Paper - WSC 2025
Keywords: bayesian approach, complex sampling design, probit regression, propensity scores
Session: CPS 27 - Nonresponse Bias, Nonprobability Sampling, and Estimation Strategies in Survey Methodology
Tuesday 7 October 4 p.m. - 5 p.m. (Europe/Amsterdam)
Abstract
Sample surveys are carried out in official statistics, public opinion and market research, sociology and many other fields of science. Traditionally used probability samples are faced with a nonresponse, response rate is decreasing with increasing number of surveys, unwillingness to take part and mobility of the population. Survey nonresponse influence decrease in the accuracy of the estimates, and a need arises to find additional data sources which may improve accuracy of the estimates of the finite population parameters. Many nonprobability data sets are available nowadays. Their examples are information from the social networks, like blogs and comments, personal documents, videos; administrative data sources arising in the traditional business systems; machine generated data, like data from fixed and mobile sensors, computer systems. Estimates obtained from such data sets are biased because of uncontrolled data collection bias. Naturally arises a desire to use nonprobability samples for improvement of the accuracy of the estimators obtained in the probability samples. Much research has been going on in this area during the last decade. We would like to make an input to the field by proposing a version of a composite estimator for the proportion in the finite population. It is a weighted combination of the estimators obtained in nonprobability and probability samples. Pseudo-weights are found for nonprobability sample by estimating the propensity scores. Penalized spline-based estimator of a probit regression model (Ruppert et al., 2003) is applied to estimate a proportion in the nonprobability sample using Bayesian approach which takes into account sample emerging way (Chen et al., 2010). Unbiased estimator is used in the probability sample. The composite estimator is constructed and its accuracy is demonstrated by simulation.
References
D. Ruppert, M. P. Wand, and J. Carroll (2003). Semiparametric Regression. Cambridge University Press.
Q. Chen, M. E. Elliott and R. J. A. Little. (2010). Bayesian penalized spline model-based inference for finite population proportion in unequal probability sampling. Survey methodology. Vol. 36, No. 1, pp. 23-34.