Lecture 1: Overview of Multilevel and Longitudinal Data

Summary of the introduction to multilevel modeling and longitudinal data analysis from Biostat 655
Biostat 655
Author
Published

June 4, 2025

Course Information

Course: Biostat 655 – Analysis of Multilevel and Longitudinal Data
Instructors: Dr. Zheyu Wang, Dr. Ji Soo Kim


Overview

This course introduces statistical methods for analyzing:

  • Multilevel data: Observations nested within higher-level units (e.g., students within schools)
  • Longitudinal data: Repeated measures over time from the same individuals

Statistical Foundations

Linear Regression Review

  • Models the expected value of outcome ( Y ) given predictors ( X ):

    \[ E(Y|X) = \beta_0 + \beta_1X_1 + \dots + \beta_pX_p \]

  • Assumes independent errors (iid Gaussian)

Extensions:

  • ANOVA/ANCOVA
  • Linear & cubic splines
  • Group interactions

Multilevel Model Basics

  • Names: Mixed models, hierarchical models, random-effects models
  • Structure: Level-1 (individuals) nested within Level-2 (e.g., families), etc.
  • Applications: Health outcomes vary across multiple levels (e.g., person, family, neighborhood)

Example: Alcohol Abuse

Level Example
Person Genetic predisposition
Family Household alcohol use
Neighborhood Access to bars
Society Legal/policy regulation

Notation for Multilevel Data

  • Nested levels:
    • Level 1: Time
    • Level 2: Person
    • Level 3: Family
    • Level 4: Neighborhood
    • Level 5: State

E.g.,
\(Y_{sijkt}\) = outcome at time ( t ) for person ( k ), family ( j ), neighborhood ( i ), state ( s )
\(X_{sijkt}\) = predictor at same unit

Synonyms for Multilevel Models

  • Mixed effects model
  • Random effects model
  • Hierarchical model
  • Growth model
  • Meta-analysis (special case)

Longitudinal = Multilevel

  • Longitudinal data = Time nested within individuals
  • Repeated measures are correlated, requiring special models

Core Questions

  1. What questions do multilevel/longitudinal models answer?
    • Population averages
    • Individual differences
    • Relationships at multiple levels
  2. Why are repeated measures correlated?
    • Same person → similar outcomes across time
  3. What happens if we ignore that correlation?
    • Invalid inferences
    • Poor confidence intervals
    • Biased results (especially with missing data)
  4. How can we model it correctly?
    • Use appropriate models (marginal or mixed)

Types of Models

Population-Average (Marginal) Models

  • Focus on overall mean structure
  • Model mean, variance, and correlation separately

Subject-Specific (Mixed Effects) Models

  • Include random effects
  • Model individual-level variation
  • Capture within-subject correlation directly

What Happens If We Ignore Correlation?

  • Biased or misleading inferences
  • Under/overestimated standard errors
  • Less statistical power
  • Invalid estimates, especially with missing data

Key Takeaways

  • Multilevel models account for:
    • Nested structure in real-world data
    • Correlation within clusters or individuals
  • Essential for accurate inference in public health and longitudinal research
  • Ignoring structure leads to:
    • Bias
    • Invalid inference
    • Poor policy or scientific decisions

References

  • Diggle et al. (2002). Analysis of Longitudinal Data
  • Rabe-Hesketh & Skrondal. Multilevel and Longitudinal Modeling Using Stata
  • Lecture slides by Dr. Zheyu Wang and Dr. Ji Soo Kim