64th ISI World Statistics Congress - Ottawa, Canada

64th ISI World Statistics Congress - Ottawa, Canada

Mitigating Bias of Crowdsourced Data of the Impact of the Covid-19 Pandemic on Enterprises


64th ISI World Statistics Congress - Ottawa, Canada

Format: CPS Abstract

Keywords: bias, bootstrap


During the Covid-19 pandemic, BPS-Statistics Indonesia conducted an enterprise survey to capture the impact of the Covid-19 pandemic on their business. This survey was conducted online by blasting survey links to enterprises. Enterprises could decide whether they want to participate or not. Hence, the data collected through this survey is crowdsourced data. This study intends to mitigate bias in this crowdsourced data to produce official statistics, on the number of large-medium and small-micro enterprises at the district level. BPS-Statistics Indonesia normally produces these statistics annually based on paper-based enterprise surveys by assigning enumerators to the selected sample.

This study starts by developing a pseudo-weight for every district on Java Island Indonesia using the information on the optimum sample size in the respective district. These weights are then used to calculate the initial district-level estimation under a Monte-Carlo simulation called non-parametric bootstrap involving 200 replications. The precision of initial estimations is then improved by applying a small area estimation (SAE) technique, an area-level Empirical Best Linear Unbiased Predictor (EBLUP) estimator under the Fay-Herriot model. The results from applying the SAE method are considered final. The model-based estimations are compared with the results of traditional an enterprise survey released as official statistics. This study reveals that crowdsourced data is reliable to produce official statistics if appropriate estimation techniques are applied, particularly in a tough situation such as the Covid-19 pandemic where most traditional surveys (field enumerations) are canceled.