64th ISI World Statistics Congress - Ottawa, Canada

64th ISI World Statistics Congress - Ottawa, Canada

A Little Bird Told Me: Official Statistics Jointly Using Social Networks and Surveys


Victor Alfredo Bustos y de la Tijera


  • S
    Silvia Fraustro
  • N
    Noemí López
  • R
    Ricardo Olvera


64th ISI World Statistics Congress - Ottawa, Canada

Format: CPS Abstract

Keywords: machine learning


We develop a proposal for National Statistical Offices (NSOs) to take advantage of household survey data and publications on social networks in order to produce representative information, on multiple topics, more frequently. The proposal is to assign a new role to survey data, as an input to train machine learning (ML) algorithms. We begin by classifying respondents using their data as recorded in the survey questionnaire. Publications on social networks from respondents, if any, inherit their class tags. Using tagged posts as input, ML algorithms are trained. For follow up, recent publications by respondents around the time of new survey collections are tagged and algorithms updated. If a trained algorithm is considered suitable, it is used to automatically tag large volumes of current and future publications from users not included in the survey. Future monitoring is carried out through tweets published between survey rounds. The above procedure may also use a minimum set of sociodemographic (SD) variables collected through surveys to develop a SD labelled database of authors. This database will be referred to during thematic studies to mitigate the selection bias caused by lack of representativeness of the population of users. For the above to work, responses to surveys and publications on networks by users-respondents must be linked. We propose a couple of ways in which this can be achieved.