Distance-based methods for archaeological data
Conference
Regional Statistics Conference 2026
Format: IPS Abstract - Malta 2026
Keywords: classification, cluster_analyzis, datavisualization
Session: IPS 1253 - Correspondence analysis and its Related Methods in Archaeology
Thursday 4 June 8:30 a.m. - 10:10 a.m. (Europe/Malta)
Abstract
In applied multivariate analysis one may encounter data where variables are not only measured on different scales, but are of different types. One often refers to such data as mixed-variable distance. For distance-based methods, the presence of mixed-variable data, requires a distance formulation that adequately deals with the different scales and types. One way to deal with the variation in types and scales is to create an unbiased mixed-variable distance (van de Velden et al. 2024). That is, a multivariate mixed-variable distance for which the influence of individual variables is not influenced by units or types. There are many options to construct unbiased mixed-variable distances. In particular, the distance part related to categorical variables allows for many different implementations. There appears to be no consensus about which variant is best. An implicit assumption underlying unbiased mixed-variable distance is that all variables contribute equally to the overall distance. In unsupervised mixed-variable settings, where external validation of results is not possible, this assumption provides an objective starting point. In supervised settings, one could argue that variables that are more informative with respect to successfully performing the task at hand, should receive higher weights in the overall distance calculations. That is, the mixed-variable distance should perhaps be biased towards more informative variables. In this paper, we consider mixed-variable distances in both unsupervised and supervised settings involving archaeological data. In particular, we study the effect of unbiasedness of different mixed-variable distance variants on both classification and visualization of archaeological data.
Van De Velden, Michel and D’Enza, Alfonso Iodice and Markos, Angelos and Cavicchia, Carlo. Unbiased mixed variables distance (2024). Available at SSRN: https://ssrn.com/abstract=5010828 or http://dx.doi.org/10.2139/ssrn.5010828