Course Information
Course: Biostat 655 – Analysis of Multilevel and Longitudinal Data
Instructors: Dr. Zheyu Wang, Dr. Ji Soo Kim
Overview
This course introduces statistical methods for analyzing:
- Multilevel data: Observations nested within higher-level units (e.g., students within schools)
- Longitudinal data: Repeated measures over time from the same individuals
Statistical Foundations
Linear Regression Review
Models the expected value of outcome ( Y ) given predictors ( X ):
\[ E(Y|X) = \beta_0 + \beta_1X_1 + \dots + \beta_pX_p \]
Assumes independent errors (iid Gaussian)
Extensions:
- ANOVA/ANCOVA
- Linear & cubic splines
- Group interactions
Multilevel Model Basics
- Names: Mixed models, hierarchical models, random-effects models
- Structure: Level-1 (individuals) nested within Level-2 (e.g., families), etc.
- Applications: Health outcomes vary across multiple levels (e.g., person, family, neighborhood)
Example: Alcohol Abuse
| Level | Example |
|---|---|
| Person | Genetic predisposition |
| Family | Household alcohol use |
| Neighborhood | Access to bars |
| Society | Legal/policy regulation |
Notation for Multilevel Data
- Nested levels:
- Level 1: Time
- Level 2: Person
- Level 3: Family
- Level 4: Neighborhood
- Level 5: State
E.g.,
\(Y_{sijkt}\) = outcome at time ( t ) for person ( k ), family ( j ), neighborhood ( i ), state ( s )
\(X_{sijkt}\) = predictor at same unit
Synonyms for Multilevel Models
- Mixed effects model
- Random effects model
- Hierarchical model
- Growth model
- Meta-analysis (special case)
Longitudinal = Multilevel
- Longitudinal data = Time nested within individuals
- Repeated measures are correlated, requiring special models
Core Questions
- What questions do multilevel/longitudinal models answer?
- Population averages
- Individual differences
- Relationships at multiple levels
- Why are repeated measures correlated?
- Same person → similar outcomes across time
- What happens if we ignore that correlation?
- Invalid inferences
- Poor confidence intervals
- Biased results (especially with missing data)
- How can we model it correctly?
- Use appropriate models (marginal or mixed)
Types of Models
Population-Average (Marginal) Models
- Focus on overall mean structure
- Model mean, variance, and correlation separately
Subject-Specific (Mixed Effects) Models
- Include random effects
- Model individual-level variation
- Capture within-subject correlation directly
What Happens If We Ignore Correlation?
- Biased or misleading inferences
- Under/overestimated standard errors
- Less statistical power
- Invalid estimates, especially with missing data
Key Takeaways
- Multilevel models account for:
- Nested structure in real-world data
- Correlation within clusters or individuals
- Essential for accurate inference in public health and longitudinal research
- Ignoring structure leads to:
- Bias
- Invalid inference
- Poor policy or scientific decisions
References
- Diggle et al. (2002). Analysis of Longitudinal Data
- Rabe-Hesketh & Skrondal. Multilevel and Longitudinal Modeling Using Stata
- Lecture slides by Dr. Zheyu Wang and Dr. Ji Soo Kim