64th ISI World Statistics Congress - Ottawa, Canada

64th ISI World Statistics Congress - Ottawa, Canada

Identifying and mitigating misclassification: A case study of the Machine Learning lifecycle in price indices with web-scraped clothing data


Serge Goussev



64th ISI World Statistics Congress - Ottawa, Canada

Format: CPS Abstract

Keywords: machine learning, misclassification, price statistics, quality assurance

Session: CPS 69 - Machine learning

Tuesday 18 July 5:30 p.m. - 6:30 p.m. (Canada/Eastern)


As National Statistical Offices (NSOs) have increasingly turned to alternative data sources (point of sale or transaction data, web scrape, and API data) to augment their traditional field collected survey data, new automated processes to classify and quality assure the large amount of data need to be created and maintained to support its use in production. While Machine Learning (ML), specifically supervised classification, has been demonstrated by many NSOs to provide considerable value at the scale necessary to process alternative data, ML models are not perfect and can misclassify records. Thus, to utilize ML for production, NSOs need to understand how forgiving price statistics are to different levels of classification accuracy, and how to deploy processes such as manual quality assurance to mitigate the impact of misclassifications.

While focus has been placed in the literature on the applicability of Machine Learning for price statistics in general and the CPI specifically, as well as metrics applicable to evaluate and select classifiers, the impact of misclassification on the calculated price statistics has been considerably understudied. This paper investigates the impact of misclassification on price statistics, by estimating the degree to which misclassification of consumer products introduces errors to the calculated price index, and whether these errors represent a bias in the calculated price, or simply a variance. We review what level of quality a model must achieve to be considered acceptable for production, and what metrics are most appropriate for evaluation of model quality. Finally, we compare quality assurance methods to determine which are the most relevant for detecting and mitigating misclassifications.
We conduct empirical evaluations using a web scraped dataset from three Canadian retailers, collected between June 2018 and December 2021. This dataset has been used in CPI production and is 100% quality assured.

We first carry out a series of simulations to analyze how varying degrees of random misclassification would have affected the final price index, for both individual categories and an aggregated index to highlight which metrics are most appropriate for evaluating classifier performance. Next, we introduce outlier detection methods to flag a percentage of products classified by a specific machine learning classifier and analyze their effectiveness at detecting misclassification. Finally, we simulate a comprehensive production workflow, starting with an ML classifier to predict the category of each new product, following with a manual process to validate or correct flagged products by leveraging the quality assurance process critical in production statistics. This provides us with a holistic picture of model quality with the quality assurance necessary to mitigate misclassification, create a feedback process for maintaining ML models and track metrics on the robustness of the developed process.

Understanding how misclassifications affect the price statistics will help establish the necessary framework to ensure the accuracy of published price indices. Findings outlined in this paper will support NSOs in understanding how they can design automated ML and quality assurance processes to employ alternative data sources at scale, while at the same time maintaining robustness and implementation efficiency in aggregated price statistics.