Download PDF

Riemannian Principal Component Analysis for Interval-Valued Variables

Author

Oldemar Rodriguez Rojas

Conference

Regional Statistics Conference 2026

Format: IPS Abstract - Malta 2026

Keywords: symbolic_data_analysis

Session: IPS 1287 - Recent Developments in Symbolic and Distributional Data Analysis

Thursday 4 June 8:30 a.m. - 10:10 a.m. (Europe/Malta)

Abstract

This paper introduces an extension of Principal Component Analysis (PCA), termed Riemannian Principal Component Analysis for Interval-Valued Observations (R-PCA-IV), which generalizes classical PCA beyond the Euclidean framework to accommodate both Riemannian geometry and symbolic interval-valued data. The proposed approach addresses two key limitations of existing methodologies: the lack of vector space structure in Riemannian manifolds and the inability of classical PCA to capture the internal variability inherent in interval-valued observations (Pennec et al., 2020; Billard et al., 2009).

In classical settings, each observation is represented as a single point in R^p. However, in many modern applications, data are more naturally described as intervals, leading to hyper-rectangular representations that encode both central tendency and internal variation (Billard et al., 2009). Symbolic PCA methods address this structure through vertices and centers approaches, which either preserve full internal variability or approximate it via representative points. Recent work highlights that the choice of representative point significantly affects projection quality, motivating optimization-based dimensionality reduction strategies (Arce and Rodríguez, 2019).

Building on these developments, we extend Principal Geodesic Analysis (PGA) (Fletcher et al., 2004) by incorporating interval-valued symbolic data into a unified Riemannian framework. We construct a local geometric structure by equipping each observation with a Riemannian metric that captures both between- and within-observation variability. Interval-valued data are represented either through vertex expansions or optimized representative points, allowing a balance between computational efficiency and information preservation.

Within this setting, Riemannian principal components are defined as geodesic directions maximizing a generalized notion of variance that accounts for both manifold geometry and internal data structure. This framework extends symbolic PCA and PGA, providing a unified approach to dimensionality reduction on manifolds while preserving intrinsic geometric and topological properties (Pennec et al., 2020).

The resulting R-PCA-IV framework offers a flexible tool for analyzing complex datasets with uncertainty, aggregation, or geometric constraints. Applications to interval-valued data show improved interpretability and more informative representations compared to classical and symbolic PCA methods.

The riemannian_stats package (Rodríguez, 2025) implements these ideas and enables the transformation of Euclidean datasets into Riemannian representations using nonlinear techniques such as UMAP (McInnes et al., 2018), supporting analysis in settings where linear assumptions are inadequate.