Statistical Methods for Analyzing Intensive Longitudinal Data in the Social Sciences
Conference
Proposal Description
Intensive longitudinal data, in which study participants are measured several times daily using, e.g., smart phone apps, are becoming increasingly common in the social sciences. For exampel, education researchers study children's learning process using data from computer based learning systems. In psychology, the temporal dynamics between variables like workplace stress, mood, and sleep are being disentangled. Clinicians studying, e.g., bipolar disorder aim to predict future manic episodes from behavioral data in order to intervene before onset, and want to measure how the time series dynamics changes during and after treatment.
The analysis of such data poses new challenges to statisticians and psychometricians, both compared to classical hierarchical regression models used for sparse sampled longitudinal data, as well as compared to time series analysis. With intensive longitudinal data, each participant has their own time series, but since the goal is to understand both individual dynamics and population-level parameters, the time series need to be coupled through hierarchical modeling.
This poses several methodological challenges which will be discussed by the four presenters:
- In addition to having missing data, individual time series are typically unequally spaced, creating an alignment problem which can be solved using continuous-time modeling, which is based on integrating stochastic differential equation models. However, estimating posterior distributions of continuous-time models is computationally challenging, and requires combining efficient methods for integrating over time series (e.g., Kalman-Bucy filters) with MCMC methods for sampling static parameters.
- Time series parameters do not only vary between individuals, but also within individuals over time, e.g., due to learning effects in education of treatment effects in clinical settings. This creates non-stationary time series, which need to be understood and dealt with.
- For typical datasets, hierarchical time series models have at least tens of thousands of parameters, requiring the development of scalable computational algorithms, and understand the trade-off between speed and accuracy, e.g., by comparing variational approximations to full Hamiltonian Monte Carlo.
- Scientific interest often concerns latent traits which are not measured directly, but instead approximately measured with multiple items, e.g., questions in a questionnaire. This poses the question of how to optimally project the measurements down to underlying latent traits, but also whether time series dynamics is best captured at the measurement level, the latent trait level, or both.
The four speakers in this session are actively working at the forefront of this area, and have published extensively both in the statistics/psychometrics literature, as well as in applied research. As suggested by the titles, they will present recent work of high relevance to the analysis of intensive longitudinal data in psychology, educational measurement, and beyond.