Population Estimation with Partial Social-Media Coverage
Conference
Format: CPS Abstract - IAOS 2026
Keywords: calibration, nonprobability samples, official statistics, propensity score weighting, two-phase sampling, web data
Session: Topics in health & demography
Tuesday 12 May 4:30 p.m. - 6 p.m. (Europe/Vilnius)
Abstract
Official statistics increasingly consider online sources, but in practice, coverage is often incomplete, yielding a nonprobability sample that may not represent the full target population. We study this problem using the population of candidates in the 2024 Lithuanian parliamentary elections (about 1,700 persons), linked to official candidate information and campaign-period posts from publicly accessible profiles on a major social network; only a substantial share of candidates can be collected in the primary workflow, while the remainder require alternative handling. Our aim is to estimate population-level indicators for the share of positive, neutral, and negative political messaging. Using auxiliary variables available for all candidates from the official election frame, we will compare three practical strategies: a two-stage probability sampling design (sampling candidates and then sampling posts), nonprobability estimation adjusted with propensity-score weighting and calibration to population margins, and a hybrid two-phase approach that adds a probability follow-up sample from the noncovered group. A manually archived benchmark will be used to evaluate bias, uncertainty, and subgroup performance, providing implementation-oriented guidance for official statistics settings.