The effectiveness of outlier detection methods in panel models
Conference
Regional Statistics Conference 2026
Format: CPS Abstract - Malta 2026
Keywords: outliers, panel data, simulation
Session: CPS 27 Outliers
Wednesday 3 June 4:30 p.m. - 5:30 p.m. (Europe/Malta)
Abstract
The detection and treatment of outliers is one of the most challenging problems in econometrics. However, this topic has received considerably less attention in the context of panel data, which have both cross-sectional and time series dimensions. There numerous established methods available for identifying and handling outliers in the case of cross-sectional regression models. In panel datasets, by contrast, the applicability and effectiveness of standard methods are much more limited. The aim of the study is to examine how efficient are these methods when applied in panel regressions?
The study first reviews the effect of outliers on regression parameters and model goodness-of-fit indicators in standard linear regressions, distinguishing between vertical outliers, leverages and influential points. The following section is an overview of panel regression techniques, focusing on their two simplest models; the fixed-effects and random-effects models. This part demonstrates that outliers can distort panel regressions in several ways, including influencing the first steps of the analysis - the tests which help to choose between differnt estimation methods.
To systematically examine the effects of outliers, simulations were conducted. During the simulations, simple and clean two-variable panel datasets with 1,000 observations were generated, excluding autocorrelation and cross-sectional dependence caused by common factors. In 5 percent of the samples (50 observations) randomly distributed outliers were entered. Subsequently, the effectiveness of traditional and well-known outlier detection tools was tested on the simulated datasets, including studentized residuals, DfBetas and DfFits measures, Cook’s distance, COVRATIO, and the Mahalanobis distance. During the simulations, the performance of the methods was examined under various parameter configurations; number of groups (N), length of the time dimension (T), ratio of within-group to between-group variance (quasi-demeaning ratio), correlation between dependent and independent variables, and correlation between group means were varied.
Based on the results, traditional outlier detection techniques, which generally work well in standard cross-sectional regressions, are less reliable in panel regression procedures. Observations classified as outliers in their own group are often hide in a panel regression, while there are observations that are not classified as outliers at the level of their own group but suddenly become outliers in a panel regression. These errors increase significantly in data sets with a relatively large number of groups and short time series.
Overall, the results highlight the need for panel-specific outlier diagnostics and caution against the uncritical use of traditional regression-based methods in the analysis of panel data.