<?xml version="1.0" encoding="UTF-8"?>
<rss  xmlns:atom="http://www.w3.org/2005/Atom" 
      xmlns:media="http://search.yahoo.com/mrss/" 
      xmlns:content="http://purl.org/rss/1.0/modules/content/" 
      xmlns:dc="http://purl.org/dc/elements/1.1/" 
      version="2.0">
<channel>
<title>Erh-Chi Hsu</title>
<link>https://erhchihsu.github.io/blog.html</link>
<atom:link href="https://erhchihsu.github.io/blog.xml" rel="self" type="application/rss+xml"/>
<description>Health and Long-Term Care</description>
<generator>quarto-1.4.554</generator>
<lastBuildDate>Tue, 22 Jul 2025 04:00:00 GMT</lastBuildDate>
<item>
  <title>Lecture 2: Autocorrelation</title>
  <dc:creator>Erh-Chi Hsu</dc:creator>
  <link>https://erhchihsu.github.io/posts/2025-07-22-autocorrelation/</link>
  <description><![CDATA[ 





<section id="course-information" class="level2">
<h2 class="anchored" data-anchor-id="course-information">Course Information</h2>
<p><strong>Course:</strong> Biostat 655 – Analysis of Multilevel and Longitudinal Data<br>
<strong>Instructors:</strong> Dr.&nbsp;Zheyu Wang, Dr.&nbsp;Ji Soo Kim</p>
<hr>
</section>
<section id="essential-concepts-from-lecture-1" class="level2">
<h2 class="anchored" data-anchor-id="essential-concepts-from-lecture-1">Essential Concepts from Lecture 1</h2>
<ul>
<li>Decomposition of the association between X and Y into <strong>cross-sectional</strong> and <strong>longitudinal</strong> components:</li>
</ul>
<p><img src="https://latex.codecogs.com/png.latex?%0A%5Cbegin%7Balign%7D%0AY_%7Bij%7D%20&amp;=%20%5Cbeta_0%20+%20%5Cbeta_1%20x_%7Bij%7D%20+%20%5Cepsilon_%7Bij%7D%20%5C%5C%0A%20%20%20%20%20%20%20&amp;=%20%5Cbeta_0%20+%20%5Cbeta_1%20(x_%7Bij%7D%20-%20x_%7Bi1%7D%20+%20x_%7Bi1%7D)%20+%20%5Cepsilon_%7Bij%7D%20%5C%5C%0A%20%20%20%20%20%20%20&amp;=%20%5Cbeta_0%20+%20%5Cbeta_1%20x_%7Bi1%7D%20+%20%5Cbeta_1%20(x_%7Bij%7D%20-%20x_%7Bi1%7D)%20+%20%5Cepsilon_%7Bij%7D%20%5C%5C%0A%20%20%20%20%20%20%20&amp;=%20%5Cbeta_0%20+%20%5Cbeta_C%20x_%7Bi1%7D%20+%20%5Cbeta_L%20(x_%7Bij%7D%20-%20x_%7Bi1%7D)%20+%20%5Cepsilon_%7Bij%7D%0A%5Cend%7Balign%7D%0A"></p>
<blockquote class="blockquote">
<blockquote class="blockquote">
<p>This model implies two key identities:</p>
</blockquote>
</blockquote>
<blockquote class="blockquote">
<blockquote class="blockquote">
<ol type="1">
<li><strong>Cross-sectional effect (C):</strong> <img src="https://latex.codecogs.com/png.latex?Y_%7Bi1%7D%20=%20%5Cbeta_C%20x_%7Bi1%7D%20+%20%5Cepsilon_%7Bi1%7D"></li>
</ol>
</blockquote>
</blockquote>
<blockquote class="blockquote">
<blockquote class="blockquote">
<ol start="2" type="1">
<li><strong>Longitudinal effect (L):</strong> <img src="https://latex.codecogs.com/png.latex?Y_%7Bij%7D%20-%20Y_%7Bi1%7D%20=%20%5Cbeta_L%20(x_%7Bij%7D%20-%20x_%7Bi1%7D)%20+%20(%5Cepsilon_%7Bij%7D%20-%20%5Cepsilon_%7Bi1%7D)"> &nbsp;&nbsp;&nbsp;</li>
</ol>
</blockquote>
</blockquote>
<ul>
<li>What’s <strong>Intraclass Correlation Coefficient (ICC)</strong>, aka <img src="https://latex.codecogs.com/png.latex?%5Crho"></li>
</ul>
<blockquote class="blockquote">
<p>We model the outcome as: <img src="https://latex.codecogs.com/png.latex?Y_%7Bij%7D%20=%20%5Cbeta_0%20+%20b_i%20+%20%5Cepsilon_%7Bij%7D"></p>
</blockquote>
<blockquote class="blockquote">
<blockquote class="blockquote">
<p><img src="https://latex.codecogs.com/png.latex?%5Cbeta_0"> : fixed effect (overall intercept)<br>
<img src="https://latex.codecogs.com/png.latex?b_i">: random effect (random intercept)<br>
<img src="https://latex.codecogs.com/png.latex?%5Cepsilon_%7Bij%7D%20%5Csim%20N(0,%20%5Csigma%5E2)">: residual error<br>
</p>
</blockquote>
</blockquote>
<blockquote class="blockquote">
<p>Variance of the Outcome:</p>
</blockquote>
<p><img src="https://latex.codecogs.com/png.latex?%0A%5Cbegin%7Balign%7D%0A%7BVar%7D(Y_%7Bij%7D)%20&amp;=%20%5Ctext%7BVar%7D(b_i%20+%20%5Cepsilon_%7Bij%7D)%20%5C%5C%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20&amp;=%20%5Ctext%7BVar%7D(b_i)%20+%20%5Ctext%7BVar%7D(%5Cepsilon_%7Bij%7D)%20%5C%5C%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20&amp;=%20%5Ctau%5E2%20+%20%5Csigma%5E2%0A%5Cend%7Balign%7D%0A"></p>
<blockquote class="blockquote">
<p>Covariance Within a Cluster:</p>
</blockquote>
<p><img src="https://latex.codecogs.com/png.latex?%0A%5Cbegin%7Baligned%7D%0A%5Ctext%7BCov%7D(Y_%7Bij%7D,%20Y_%7Bik%7D)%20&amp;=%20%5Ctext%7BCov%7D(b_i%20+%20%5Cepsilon_%7Bij%7D,%20b_i%20+%20%5Cepsilon_%7Bik%7D)%20%5C%5C%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20&amp;=%20%5Ctext%7BCov%7D(b_i,%20b_i)%20+%20%5Ctext%7BCov%7D(%5Cepsilon_%7Bij%7D,%20%5Cepsilon_%7Bik%7D)%20%5C%5C%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20&amp;=%20%5Ctau%5E2%0A%5Cend%7Baligned%7D%0A"></p>
<blockquote class="blockquote">
<p>Intraclass Correlation Coefficient (ICC)</p>
</blockquote>
<p><img src="https://latex.codecogs.com/png.latex?%0A%5Cbegin%7Baligned%7D%0A%5Crho%20&amp;=%20%5Ctext%7BCorr%7D(Y_%7Bij%7D,%20Y_%7Bik%7D)%20%5C%5C%0A%20%20%20%20%20&amp;=%20%5Cfrac%7B%5Ctext%7BCov%7D(Y_%7Bij%7D,%20Y_%7Bik%7D)%7D%7B%5Csqrt%7B%5Ctext%7BVar%7D(Y_%7Bij%7D)%20%5Ccdot%20%5Ctext%7BVar%7D(Y_%7Bik%7D)%7D%7D%20%5C%5C%0A%20%20%20%20%20&amp;=%20%5Cfrac%7B%5Ctau%5E2%7D%7B%5Ctau%5E2%20+%20%5Csigma%5E2%7D%20%5C%5C%0A%20%20%20%20%20&amp;=%20%5Cfrac%7B%5Ctext%7BTotal%20Variance%7D%20-%20%5Ctext%7BWithin-Cluster%20Variance%7D%7D%7B%5Ctext%7BTotal%20Variance%7D%7D%0A%5Cend%7Baligned%7D%0A"></p>
<ul>
<li><p>Consequences of ignoring clustering in data</p>
<ul>
<li>Regression estimates remain unbiased if missing data is minimal</li>
<li>Standard errors, confidence intervals, and tests may be incorrect</li>
<li>Estimates may be inefficient (i.e., more variable than necessary)</li>
</ul></li>
</ul>
<hr>
</section>
<section id="autocorrelation" class="level2">
<h2 class="anchored" data-anchor-id="autocorrelation">Autocorrelation</h2>
<ul>
<li>“Past is prologue”</li>
<li>“No two people are alike”</li>
<li>Repeated measurements on a person tend to be more similar to each other than to observations from another person!</li>
</ul>
<hr>
<section id="variance-covariance-structure" class="level3">
<h3 class="anchored" data-anchor-id="variance-covariance-structure">Variance-Covariance Structure</h3>
<ul>
<li>Expressed in terms of common variance and correlation:
<ul>
<li><img src="https://latex.codecogs.com/png.latex?Corr(Y_%7Bij%7D,%20Y_%7Bik%7D)%20=%20Corr(%5Cepsilon_%7Bij%7D,%20%5Cepsilon_%7Bik%7D)%20=%20%5Crho_%7Bjk%7D%20=%20%5Cfrac%7BCov(%5Cepsilon_%7Bij%7D,%20%5Cepsilon_%7Bik%7D)%7D%7Bvar(%5Cepsilon_%7Bij%7D)%20%5Ccdot%20%20var(%5Cepsilon_%7Bik%7D)%7D"></li>
<li><img src="https://latex.codecogs.com/png.latex?Cov(%5Cepsilon_%7Bij%7D,%20%5Cepsilon_%7Bik%7D)%20=%20%5Crho_%7Bjk%7D%5Csigma%5E2"></li>
</ul></li>
</ul>
<hr>
</section>
<section id="empirical-correlation-matrix-for-residuals" class="level3">
<h3 class="anchored" data-anchor-id="empirical-correlation-matrix-for-residuals">Empirical Correlation Matrix for Residuals</h3>
<table class="table">
<thead>
<tr class="header">
<th></th>
<th><img src="https://latex.codecogs.com/png.latex?t_%7Bij%7D"></th>
<th></th>
<th></th>
<th></th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td><img src="https://latex.codecogs.com/png.latex?t_%7Bik%7D"></td>
<td>2003</td>
<td>2004</td>
<td>2005</td>
<td>2006</td>
</tr>
<tr class="even">
<td>2004</td>
<td>0.983</td>
<td></td>
<td></td>
<td></td>
</tr>
<tr class="odd">
<td>2005</td>
<td>0.971</td>
<td>0.986</td>
<td></td>
<td></td>
</tr>
<tr class="even">
<td>2006</td>
<td>0.970</td>
<td>0.977</td>
<td>0.983</td>
<td></td>
</tr>
<tr class="odd">
<td>2007</td>
<td>0.970</td>
<td>0.965</td>
<td>0.963</td>
<td>0.984</td>
</tr>
</tbody>
</table>
<ul>
<li><p><strong>Lag Formula:</strong> <img src="https://latex.codecogs.com/png.latex?%5Ctext%7BLAG%7D%20=%20%7Ct_%7Bij%7D%20-%20t_%7Bik%7D%7C"><br>
</p></li>
<li><p><strong>Autocorrelation function:</strong> <img src="https://latex.codecogs.com/png.latex?%5Crho(u)%20=%20Corr(r_%7Bij%7D,%20r_%7Bik%7D),%20k%20=%20j%20+%20u"><br>
</p></li>
<li><p>The <strong>standard error</strong> of <img src="https://latex.codecogs.com/png.latex?%5Crho(u)"> is roughly <img src="https://latex.codecogs.com/png.latex?%5Cfrac%7B1%7D%7B%5Csqrt%7BN(u)%7D%7D"> where <img src="https://latex.codecogs.com/png.latex?%7BN(u)%7D"> is the number of pairs of observations at lag <img src="https://latex.codecogs.com/png.latex?u"></p></li>
<li><p>Lag 1 correlation estimates: 0.983, 0.986, 0.983, 0.984 -&gt; Pooled Lag 1 correlation estimate: 0.984 (average)<br>
</p></li>
<li><p>Lag 2 correlation estimates: 0.971, 0.977, 0.963 -&gt; Pooled Lag 2 correlation estimate: 0.970 (average)<br>
</p></li>
<li><p>Lag 3 correlation estimates: 0.970, 0.965 -&gt; Pooled Lag 3 correlation estimate: 0.966 (average)<br>
</p></li>
<li><p>Lag 4 correlation estimates: 0.970</p></li>
</ul>
<hr>
</section>
<section id="estimating-acf" class="level3">
<h3 class="anchored" data-anchor-id="estimating-acf">Estimating ACF</h3>
<ul>
<li>In Stata: <code>autocor</code></li>
<li>In SAS: see <code>autocorr.sas</code></li>
<li>Use tolerance limits to assess significance</li>
</ul>
<hr>
</section>
<section id="autocovariance-matrix" class="level3">
<h3 class="anchored" data-anchor-id="autocovariance-matrix">Autocovariance Matrix</h3>
<ul>
<li>To know if the residuals have roughly the same variance at each time-point</li>
<li>Use <code>cov</code> option in R or Stata</li>
<li>Norms
<ul>
<li>Administrative databases (e.g.&nbsp;Medicare data): large correlation in the outcomes across time even for large lags.</li>
<li>Individual patients: Correlation decays as the time lag increases and the correlation does not appear to go to zero even at large lags.</li>
</ul></li>
</ul>
<hr>
</section>
</section>
<section id="common-covariance-structures" class="level2">
<h2 class="anchored" data-anchor-id="common-covariance-structures">Common Covariance Structures</h2>
<section id="unstructured-raw" class="level3">
<h3 class="anchored" data-anchor-id="unstructured-raw">1. Unstructured (Raw)</h3>
<ul>
<li>Estimate all variances and covariances</li>
<li>Requires <img src="https://latex.codecogs.com/png.latex?%5Cfrac%7Bn(n+1)%7D%7B2%7D"> parameters, in this case 4*5/2 = 10</li>
</ul>
<table class="table">
<thead>
<tr class="header">
<th></th>
<th>Time1</th>
<th>Time2</th>
<th>Time3</th>
<th>Time4</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td>Time1</td>
<td><strong>Var₁</strong></td>
<td>Cov(1,2)</td>
<td>Cov(1,3)</td>
<td>Cov(1,4)</td>
</tr>
<tr class="even">
<td>Time2</td>
<td>Cov(2,1)</td>
<td><strong>Var₂</strong></td>
<td>Cov(2,3)</td>
<td>Cov(2,4)</td>
</tr>
<tr class="odd">
<td>Time3</td>
<td>Cov(3,1)</td>
<td>Cov(3,2)</td>
<td><strong>Var₃</strong></td>
<td>Cov(3,4)</td>
</tr>
<tr class="even">
<td>Time4</td>
<td>Cov(4,1)</td>
<td>Cov(4,2)</td>
<td>Cov(4,3)</td>
<td><strong>Var₄</strong></td>
</tr>
</tbody>
</table>
<hr>
</section>
<section id="compound-symmetry-exchangeable" class="level3">
<h3 class="anchored" data-anchor-id="compound-symmetry-exchangeable">2. Compound Symmetry (Exchangeable)</h3>
<ul>
<li>Same variance, same correlation</li>
<li>Only 2 parameters needed</li>
<li>Assuming autocorrelation is not fading away by time</li>
</ul>
<table class="table">
<thead>
<tr class="header">
<th></th>
<th>Time1</th>
<th>Time2</th>
<th>Time3</th>
<th>Time4</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td>Time1</td>
<td><strong>Var</strong></td>
<td>Var*<img src="https://latex.codecogs.com/png.latex?%5Crho"></td>
<td>Var*<img src="https://latex.codecogs.com/png.latex?%5Crho"></td>
<td>Var*<img src="https://latex.codecogs.com/png.latex?%5Crho"></td>
</tr>
<tr class="even">
<td>Time2</td>
<td>Var*<img src="https://latex.codecogs.com/png.latex?%5Crho"></td>
<td><strong>Var</strong></td>
<td>Var*<img src="https://latex.codecogs.com/png.latex?%5Crho"></td>
<td>Var*<img src="https://latex.codecogs.com/png.latex?%5Crho"></td>
</tr>
<tr class="odd">
<td>Time3</td>
<td>Var*<img src="https://latex.codecogs.com/png.latex?%5Crho"></td>
<td>Var*<img src="https://latex.codecogs.com/png.latex?%5Crho"></td>
<td><strong>Var</strong></td>
<td>Var*<img src="https://latex.codecogs.com/png.latex?%5Crho"></td>
</tr>
<tr class="even">
<td>Time4</td>
<td>Var*<img src="https://latex.codecogs.com/png.latex?%5Crho"></td>
<td>Var*<img src="https://latex.codecogs.com/png.latex?%5Crho"></td>
<td>Var*<img src="https://latex.codecogs.com/png.latex?%5Crho"></td>
<td><strong>Var</strong></td>
</tr>
</tbody>
</table>
<hr>
</section>
<section id="autoregressive-ar-1" class="level3">
<h3 class="anchored" data-anchor-id="autoregressive-ar-1">3. Autoregressive (AR-1)</h3>
<ul>
<li>The correlation at lag <img src="https://latex.codecogs.com/png.latex?k"> is <img src="https://latex.codecogs.com/png.latex?%5Crho%5E%7B%7Ck%7C%7D"></li>
<li>Only 2 parameters needed</li>
<li>Suitable for discrete time</li>
</ul>
<table class="table">
<colgroup>
<col style="width: 10%">
<col style="width: 22%">
<col style="width: 22%">
<col style="width: 22%">
<col style="width: 22%">
</colgroup>
<thead>
<tr class="header">
<th></th>
<th>Time1</th>
<th>Time2</th>
<th>Time3</th>
<th>Time4</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td>Time1</td>
<td><strong>Var</strong></td>
<td>Var*<img src="https://latex.codecogs.com/png.latex?%5Crho"></td>
<td>Var*<img src="https://latex.codecogs.com/png.latex?%5Crho%5E2"></td>
<td>Var*<img src="https://latex.codecogs.com/png.latex?%5Crho%5E3"></td>
</tr>
<tr class="even">
<td>Time2</td>
<td>Var*<img src="https://latex.codecogs.com/png.latex?%5Crho"></td>
<td><strong>Var</strong></td>
<td>Var*<img src="https://latex.codecogs.com/png.latex?%5Crho"></td>
<td>Var*<img src="https://latex.codecogs.com/png.latex?%5Crho%5E2"></td>
</tr>
<tr class="odd">
<td>Time3</td>
<td>Var*<img src="https://latex.codecogs.com/png.latex?%5Crho%5E2"></td>
<td>Var*<img src="https://latex.codecogs.com/png.latex?%5Crho"></td>
<td><strong>Var</strong></td>
<td>Var*<img src="https://latex.codecogs.com/png.latex?%5Crho"></td>
</tr>
<tr class="even">
<td>Time4</td>
<td>Var*<img src="https://latex.codecogs.com/png.latex?%5Crho%5E3"></td>
<td>Var*<img src="https://latex.codecogs.com/png.latex?%5Crho%5E2"></td>
<td>Var*<img src="https://latex.codecogs.com/png.latex?%5Crho"></td>
<td><strong>Var</strong></td>
</tr>
</tbody>
</table>
<hr>
</section>
<section id="toeplitz" class="level3">
<h3 class="anchored" data-anchor-id="toeplitz">4. Toeplitz</h3>
<ul>
<li>One variance; Different correlation per lag</li>
<li>Requires <img src="https://latex.codecogs.com/png.latex?n"> parameters</li>
</ul>
<table class="table">
<colgroup>
<col style="width: 10%">
<col style="width: 22%">
<col style="width: 22%">
<col style="width: 22%">
<col style="width: 22%">
</colgroup>
<thead>
<tr class="header">
<th></th>
<th>Time1</th>
<th>Time2</th>
<th>Time3</th>
<th>Time4</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td>Time1</td>
<td><strong>Var</strong></td>
<td>Var*<img src="https://latex.codecogs.com/png.latex?%5Crho"></td>
<td>Var*<img src="https://latex.codecogs.com/png.latex?%5Crho_2"></td>
<td>Var*<img src="https://latex.codecogs.com/png.latex?%5Crho_3"></td>
</tr>
<tr class="even">
<td>Time2</td>
<td>Var*<img src="https://latex.codecogs.com/png.latex?%5Crho"></td>
<td><strong>Var</strong></td>
<td>Var*<img src="https://latex.codecogs.com/png.latex?%5Crho"></td>
<td>Var*<img src="https://latex.codecogs.com/png.latex?%5Crho_2"></td>
</tr>
<tr class="odd">
<td>Time3</td>
<td>Var*<img src="https://latex.codecogs.com/png.latex?%5Crho_2"></td>
<td>Var*<img src="https://latex.codecogs.com/png.latex?%5Crho"></td>
<td><strong>Var</strong></td>
<td>Var*<img src="https://latex.codecogs.com/png.latex?%5Crho"></td>
</tr>
<tr class="even">
<td>Time4</td>
<td>Var*<img src="https://latex.codecogs.com/png.latex?%5Crho_3"></td>
<td>Var*<img src="https://latex.codecogs.com/png.latex?%5Crho_2"></td>
<td>Var*<img src="https://latex.codecogs.com/png.latex?%5Crho"></td>
<td><strong>Var</strong></td>
</tr>
</tbody>
</table>
<hr>
</section>
<section id="banded-structures" class="level3">
<h3 class="anchored" data-anchor-id="banded-structures">6. Banded Structures</h3>
<ul>
<li>Set correlations to zero beyond lag <img src="https://latex.codecogs.com/png.latex?k"> (e.g., Banded Toeplitz 2)</li>
</ul>
<table class="table">
<thead>
<tr class="header">
<th></th>
<th>Time1</th>
<th>Time2</th>
<th>Time3</th>
<th>Time4</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td>Time1</td>
<td><strong>Var</strong></td>
<td>Var*<img src="https://latex.codecogs.com/png.latex?%5Crho"></td>
<td>0</td>
<td>0</td>
</tr>
<tr class="even">
<td>Time2</td>
<td>Var*<img src="https://latex.codecogs.com/png.latex?%5Crho"></td>
<td><strong>Var</strong></td>
<td>Var*<img src="https://latex.codecogs.com/png.latex?%5Crho"></td>
<td>0</td>
</tr>
<tr class="odd">
<td>Time3</td>
<td>0</td>
<td>Var*<img src="https://latex.codecogs.com/png.latex?%5Crho"></td>
<td><strong>Var</strong></td>
<td>Var*<img src="https://latex.codecogs.com/png.latex?%5Crho"></td>
</tr>
<tr class="even">
<td>Time4</td>
<td>0</td>
<td>0</td>
<td>Var*<img src="https://latex.codecogs.com/png.latex?%5Crho"></td>
<td><strong>Var</strong></td>
</tr>
</tbody>
</table>
<hr>
</section>
<section id="heteroskedastic-models" class="level3">
<h3 class="anchored" data-anchor-id="heteroskedastic-models">7. Heteroskedastic Models</h3>
<ul>
<li>Allow variance to vary by time</li>
<li>Combine with AR, Toeplitz, or CS structures</li>
<li>Heteroskedastic toeplitz: n + (n-1) estimates</li>
</ul>
<table class="table">
<colgroup>
<col style="width: 20%">
<col style="width: 20%">
<col style="width: 20%">
<col style="width: 20%">
<col style="width: 20%">
</colgroup>
<thead>
<tr class="header">
<th></th>
<th>Time1</th>
<th>Time2</th>
<th>Time3</th>
<th>Time4</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td>Time1</td>
<td><strong>Var1</strong></td>
<td><img src="https://latex.codecogs.com/png.latex?%5Csqrt%7BVar_1%7D*%5Csqrt%7BVar_2%7D*%5Crho"></td>
<td><img src="https://latex.codecogs.com/png.latex?%5Csqrt%7BVar_1%7D*%5Csqrt%7BVar_3%7D*%5Crho_2"></td>
<td><img src="https://latex.codecogs.com/png.latex?%5Csqrt%7BVar_1%7D*%5Csqrt%7BVar_4%7D*%5Crho_3"></td>
</tr>
<tr class="even">
<td>Time2</td>
<td><img src="https://latex.codecogs.com/png.latex?%5Csqrt%7BVar_1%7D*%5Csqrt%7BVar_2%7D*%5Crho"></td>
<td><strong>Var2</strong></td>
<td><img src="https://latex.codecogs.com/png.latex?%5Csqrt%7BVar_2%7D*%5Csqrt%7BVar_3%7D*%5Crho"></td>
<td><img src="https://latex.codecogs.com/png.latex?%5Csqrt%7BVar_2%7D*%5Csqrt%7BVar_4%7D*%5Crho_2"></td>
</tr>
<tr class="odd">
<td>Time3</td>
<td><img src="https://latex.codecogs.com/png.latex?%5Csqrt%7BVar_1%7D*%5Csqrt%7BVar_3%7D*%5Crho_2"></td>
<td><img src="https://latex.codecogs.com/png.latex?%5Csqrt%7BVar_2%7D*%5Csqrt%7BVar_3%7D*%5Crho"></td>
<td><strong>Var3</strong></td>
<td><img src="https://latex.codecogs.com/png.latex?%5Csqrt%7BVar_3%7D*%5Csqrt%7BVar_4%7D*%5Crho"></td>
</tr>
<tr class="even">
<td>Time4</td>
<td><img src="https://latex.codecogs.com/png.latex?%5Csqrt%7BVar_1%7D*%5Csqrt%7BVar_4%7D*%5Crho_3"></td>
<td><img src="https://latex.codecogs.com/png.latex?%5Csqrt%7BVar_2%7D*%5Csqrt%7BVar_4%7D*%5Crho_2"></td>
<td><img src="https://latex.codecogs.com/png.latex?%5Csqrt%7BVar_3%7D*%5Csqrt%7BVar_4%7D*%5Crho"></td>
<td><strong>Var4</strong></td>
</tr>
</tbody>
</table>
<hr>


</section>
</section>

 ]]></description>
  <category>Biostat 655</category>
  <guid>https://erhchihsu.github.io/posts/2025-07-22-autocorrelation/</guid>
  <pubDate>Tue, 22 Jul 2025 04:00:00 GMT</pubDate>
</item>
<item>
  <title>Lecture 3 Design Based Regression Analysis</title>
  <dc:creator>Erh-Chi Hsu</dc:creator>
  <link>https://erhchihsu.github.io/posts/2025-06-26-design-based-regression-analysis/</link>
  <description><![CDATA[ 





<section id="course-information" class="level2">
<h2 class="anchored" data-anchor-id="course-information">Course Information</h2>
<p><strong>Course:</strong> PFRH 712 - Methods in Analysis of Large Population Surveys<br>
<strong>Instructor:</strong> Dr.&nbsp;Saifuddin Ahmed</p>
<hr>
</section>
<section id="overview" class="level2">
<h2 class="anchored" data-anchor-id="overview">Overview</h2>
<p>This lecture focuses on the principles and applications of <strong>design-based regression analysis</strong> using complex survey data in R. It introduces the correct use of <code>lm()</code>, <code>glm()</code>, and <code>svyglm()</code> functions with appropriate survey design using the <code>survey</code> package. It emphasizes best practices in model fitting, diagnostics, and interpretation, and highlights common pitfalls such as misuse of stepwise regression and pseudo R².</p>
<hr>
</section>
<section id="steps-for-fitting-a-model" class="level2">
<h2 class="anchored" data-anchor-id="steps-for-fitting-a-model">4 Steps for Fitting a Model</h2>
<section id="specify" class="level3">
<h3 class="anchored" data-anchor-id="specify">1. Specify</h3>
<ul>
<li>Use literature and a <strong>conceptual framework</strong> to guide covariate selection<br>
</li>
<li><strong>Avoid stepwise regression</strong><br>
&gt; “Stepwise variable selection is one of the most widely used and abused of all data analysis techniques.” – Frank Harrell<br>
</li>
<li>Use theory-driven or evidence-based model building</li>
</ul>
<hr>
</section>
<section id="fit" class="level3">
<h3 class="anchored" data-anchor-id="fit">2. Fit</h3>
<section id="linear-regression-with-survey-weights" class="level4">
<h4 class="anchored" data-anchor-id="linear-regression-with-survey-weights">📘 Linear regression with survey weights</h4>
<div class="sourceCode" id="cb1" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb1-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(survey)</span>
<span id="cb1-2"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Define the survey design</span></span>
<span id="cb1-3">des <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">svydesign</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">ids =</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span>psu, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">strata =</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span>strata, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">weights =</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span>weight, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">data =</span> df)</span>
<span id="cb1-4"></span>
<span id="cb1-5"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Linear regression</span></span>
<span id="cb1-6">model_lm <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">svyglm</span>(gfr <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span> cpr, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">design =</span> des)</span>
<span id="cb1-7"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">summary</span>(model_lm)</span></code></pre></div>
</section>
<section id="logistic-regression-with-design" class="level4">
<h4 class="anchored" data-anchor-id="logistic-regression-with-design">📘 Logistic regression with design</h4>
<div class="sourceCode" id="cb2" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb2-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Binary outcome model</span></span>
<span id="cb2-2">model_logit <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">svyglm</span>(mcu <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span> age <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> education, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">design =</span> des, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">family =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">quasibinomial</span>())</span>
<span id="cb2-3"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">summary</span>(model_logit)</span></code></pre></div>
</section>
<section id="log-binomial-risk-ratio" class="level4">
<h4 class="anchored" data-anchor-id="log-binomial-risk-ratio">📘 Log-binomial (risk ratio)</h4>
<div class="sourceCode" id="cb3" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb3-1">model_logbin <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">svyglm</span>(mcu <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span> age <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> education, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">design =</span> des, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">family =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">binomial</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">link =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"log"</span>))</span>
<span id="cb3-2"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">summary</span>(model_logbin)</span></code></pre></div>
</section>
<section id="poisson-alternative-for-estimating-rr" class="level4">
<h4 class="anchored" data-anchor-id="poisson-alternative-for-estimating-rr">📘 Poisson alternative for estimating RR</h4>
<div class="sourceCode" id="cb4" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb4-1">model_poisson <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">svyglm</span>(mcu <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span> age <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> education, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">design =</span> des, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">family =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">poisson</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">link =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"log"</span>))</span>
<span id="cb4-2"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">summary</span>(model_poisson)</span></code></pre></div>
<hr>
</section>
</section>
<section id="check-test" class="level3">
<h3 class="anchored" data-anchor-id="check-test">3. Check / Test</h3>
<section id="model-diagnostics-for-linear-models" class="level4">
<h4 class="anchored" data-anchor-id="model-diagnostics-for-linear-models">📘 Model diagnostics for linear models</h4>
<p><strong>Residual vs Fitted Plot</strong></p>
<div class="cell">
<div class="sourceCode cell-code" id="cb5" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb5-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">plot</span>(model_lm, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">which =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>)</span></code></pre></div>
<div class="cell-output-display">
<div>
<figure class="figure">
<p><img src="https://erhchihsu.github.io/posts/2025-06-26-design-based-regression-analysis/index_files/figure-html/unnamed-chunk-2-1.png" class="img-fluid figure-img" width="672"></p>
</figure>
</div>
</div>
</div>
<p>Interpretation:</p>
<p>✅ Good fit if points scatter randomly around the horizontal line.</p>
<p>❌ Poor fit if there’s a curve or funnel shape (non-linearity or heteroscedasticity).</p>
<p><strong>Q-Q Plot (Normality of Residuals)</strong></p>
<div class="cell">
<div class="sourceCode cell-code" id="cb6" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb6-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">plot</span>(model_lm, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">which =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>)</span></code></pre></div>
<div class="cell-output-display">
<div>
<figure class="figure">
<p><img src="https://erhchihsu.github.io/posts/2025-06-26-design-based-regression-analysis/index_files/figure-html/unnamed-chunk-3-1.png" class="img-fluid figure-img" width="672"></p>
</figure>
</div>
</div>
</div>
<p>Interpretation:</p>
<p>✅ Good fit if points lie approximately on the diagonal line.</p>
<p>❌ Poor fit if points deviate at the ends (skewed distribution).</p>
</section>
<section id="heteroscedasticity-breusch-pagan-test" class="level4">
<h4 class="anchored" data-anchor-id="heteroscedasticity-breusch-pagan-test">📘 Heteroscedasticity (Breusch-Pagan test)</h4>
<div class="cell">
<div class="sourceCode cell-code" id="cb7" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb7-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">bptest</span>(model_lm)</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>
    studentized Breusch-Pagan test

data:  model_lm
BP = 0.040438, df = 1, p-value = 0.8406</code></pre>
</div>
</div>
<p>Interpretation:</p>
<ul>
<li><p><strong>p &gt; 0.05</strong> → Homoscedasticity (good!)</p></li>
<li><p><strong>p &lt; 0.05</strong> → Heteroscedasticity (bad — use robust SEs)</p></li>
</ul>
</section>
<section id="cooks-distance-outlier-detection" class="level4">
<h4 class="anchored" data-anchor-id="cooks-distance-outlier-detection">📘 Cook’s Distance (outlier detection)</h4>
<div class="cell">
<div class="sourceCode cell-code" id="cb9" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb9-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">plot</span>(model_lm, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">which =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">4</span>)</span></code></pre></div>
<div class="cell-output-display">
<div>
<figure class="figure">
<p><img src="https://erhchihsu.github.io/posts/2025-06-26-design-based-regression-analysis/index_files/figure-html/unnamed-chunk-5-1.png" class="img-fluid figure-img" width="672"></p>
</figure>
</div>
</div>
</div>
<p>Interpretation:</p>
<ul>
<li>Points with <strong>Cook’s D &gt; 1</strong> may be <strong>influential</strong> and deserve review.</li>
</ul>
</section>
<section id="goodness-of-fit-for-logistic-models" class="level4">
<h4 class="anchored" data-anchor-id="goodness-of-fit-for-logistic-models">📘 Goodness-of-fit for logistic models</h4>
<div class="cell">
<div class="sourceCode cell-code" id="cb10" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb10-1">model_logit <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">glm</span>(am <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span> wt <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> hp, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">data =</span> mtcars, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">family =</span> binomial)</span>
<span id="cb10-2"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">summary</span>(model_logit)</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>
Call:
glm(formula = am ~ wt + hp, family = binomial, data = mtcars)

Coefficients:
            Estimate Std. Error z value Pr(&gt;|z|)   
(Intercept) 18.86630    7.44356   2.535  0.01126 * 
wt          -8.08348    3.06868  -2.634  0.00843 **
hp           0.03626    0.01773   2.044  0.04091 * 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 43.230  on 31  degrees of freedom
Residual deviance: 10.059  on 29  degrees of freedom
AIC: 16.059

Number of Fisher Scoring iterations: 8</code></pre>
</div>
<div class="sourceCode cell-code" id="cb12" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb12-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Run Hosmer-Lemeshow test</span></span>
<span id="cb12-2"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">hoslem.test</span>(mtcars<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>am, <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">fitted</span>(model_logit))</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>
    Hosmer and Lemeshow goodness of fit (GOF) test

data:  mtcars$am, fitted(model_logit)
X-squared = 4.9517, df = 8, p-value = 0.7627</code></pre>
</div>
</div>
<p>Interpretation:</p>
<p>✅ p &gt; 0.05 = model fits well</p>
<p>❌ p &lt; 0.05 = poor fit</p>
</section>
<section id="summary-table" class="level4">
<h4 class="anchored" data-anchor-id="summary-table">Summary Table</h4>
<table class="table">
<thead>
<tr class="header">
<th>Diagnostic</th>
<th>Good Fit</th>
<th>Poor Fit</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td>Residual Plot</td>
<td>Random scatter</td>
<td>Curved / funnel shape</td>
</tr>
<tr class="even">
<td>Q-Q Plot</td>
<td>Points near line</td>
<td>Points deviate</td>
</tr>
<tr class="odd">
<td>Breusch-Pagan Test</td>
<td>p &gt; 0.05</td>
<td>p &lt; 0.05</td>
</tr>
<tr class="even">
<td>Cook’s Distance</td>
<td>All &lt; 1</td>
<td>Some &gt; 1</td>
</tr>
<tr class="odd">
<td>Hosmer-Lemeshow Test</td>
<td>p &gt; 0.05</td>
<td>p &lt; 0.05</td>
</tr>
</tbody>
</table>
</section>
<section id="model-comparison-using-aicbic" class="level4">
<h4 class="anchored" data-anchor-id="model-comparison-using-aicbic">📘 Model comparison using AIC/BIC</h4>
<div class="sourceCode" id="cb14" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb14-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">AIC</span>(model_logit)</span>
<span id="cb14-2"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">BIC</span>(model_logit)</span></code></pre></div>
<p><strong>Use AIC/BIC to compare models: lower = better</strong></p>
<hr>
</section>
</section>
<section id="interpret" class="level3">
<h3 class="anchored" data-anchor-id="interpret">4. Interpret</h3>
<ul>
<li><strong>R²</strong> in linear regression: Proportion of variance explained</li>
<li><strong>Pseudo R²</strong> in logistic regression: Not equivalent to R² and should not be interpreted as such</li>
<li><strong>Odds Ratio (OR):</strong> exp(b) from logistic regression</li>
</ul>
<div class="sourceCode" id="cb15" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb15-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">exp</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">coef</span>(model_logit))</span></code></pre></div>
<ul>
<li><strong>Risk Ratio (RR):</strong> Use log-binomial or Poisson models with robust SEs</li>
</ul>
<div class="sourceCode" id="cb16" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb16-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">exp</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">coef</span>(model_poisson))</span></code></pre></div>
<hr>
</section>
</section>
<section id="key-takeaways" class="level2">
<h2 class="anchored" data-anchor-id="key-takeaways">Key Takeaways</h2>
<ul>
<li>Always use svydesign() and svyglm() when analyzing complex survey data in R</li>
<li>Avoid stepwise regression—prefer theory-driven model building</li>
<li>Validate model assumptions with residual plots and Goodness-of-Fit tests</li>
<li>Use log-binomial or Poisson models to estimate risk ratios when outcomes are common</li>
<li>AIC and BIC help compare model fit, especially when choosing between linear vs.&nbsp;polynomial or nested models</li>
</ul>
<hr>
</section>
<section id="references" class="level2">
<h2 class="anchored" data-anchor-id="references">References</h2>
<ul>
<li>Lecture slides by Dr.&nbsp;Saifuddin Ahmed, PFRH 712 (2025)</li>
<li>Lumley, T. (2010). <em>Complex Surveys: A Guide to Analysis Using R.</em> Wiley.</li>
<li>Harrell, F. (2015). <em>Regression Modeling Strategies.</em> Springer.</li>
<li>Hosmer, D., Lemeshow, S., &amp; Sturdivant, R. (2013). <em>Applied Logistic Regression.</em> Wiley.</li>
<li>R packages: survey, lmtest, ResourceSelection, ggplot2, srvyr</li>
</ul>


</section>

 ]]></description>
  <category>PFRH 712</category>
  <guid>https://erhchihsu.github.io/posts/2025-06-26-design-based-regression-analysis/</guid>
  <pubDate>Thu, 26 Jun 2025 04:00:00 GMT</pubDate>
</item>
<item>
  <title></title>
  <dc:creator>Erh-Chi Hsu</dc:creator>
  <link>https://erhchihsu.github.io/posts/2025-06-04-variance_estimation/</link>
  <description><![CDATA[ undefined ]]></description>
  <category>PFRH 712</category>
  <guid>https://erhchihsu.github.io/posts/2025-06-04-variance_estimation/</guid>
  <pubDate>Wed, 04 Jun 2025 04:00:00 GMT</pubDate>
</item>
<item>
  <title>Lecture 1: Overview of Multilevel and Longitudinal Data</title>
  <dc:creator>Erh-Chi Hsu</dc:creator>
  <link>https://erhchihsu.github.io/posts/2025-06-04-/</link>
  <description><![CDATA[ 





<section id="course-information" class="level2">
<h2 class="anchored" data-anchor-id="course-information">Course Information</h2>
<p><strong>Course:</strong> Biostat 655 – Analysis of Multilevel and Longitudinal Data<br>
<strong>Instructors:</strong> Dr.&nbsp;Zheyu Wang, Dr.&nbsp;Ji Soo Kim</p>
<hr>
</section>
<section id="overview" class="level2">
<h2 class="anchored" data-anchor-id="overview">Overview</h2>
<p>This course introduces statistical methods for analyzing:</p>
<ul>
<li><strong>Multilevel data</strong>: Observations nested within higher-level units (e.g., students within schools)</li>
<li><strong>Longitudinal data</strong>: Repeated measures over time from the same individuals</li>
</ul>
<hr>
</section>
<section id="statistical-foundations" class="level2">
<h2 class="anchored" data-anchor-id="statistical-foundations">Statistical Foundations</h2>
<section id="linear-regression-review" class="level3">
<h3 class="anchored" data-anchor-id="linear-regression-review">Linear Regression Review</h3>
<ul>
<li><p>Models the expected value of outcome ( Y ) given predictors ( X ):</p>
<p><img src="https://latex.codecogs.com/png.latex?%0AE(Y%7CX)%20=%20%5Cbeta_0%20+%20%5Cbeta_1X_1%20+%20%5Cdots%20+%20%5Cbeta_pX_p%0A"></p></li>
<li><p>Assumes independent errors (iid Gaussian)</p></li>
</ul>
</section>
<section id="extensions" class="level3">
<h3 class="anchored" data-anchor-id="extensions">Extensions:</h3>
<ul>
<li>ANOVA/ANCOVA</li>
<li>Linear &amp; cubic splines</li>
<li>Group interactions</li>
</ul>
<hr>
</section>
</section>
<section id="multilevel-model-basics" class="level2">
<h2 class="anchored" data-anchor-id="multilevel-model-basics">Multilevel Model Basics</h2>
<ul>
<li><strong>Names</strong>: Mixed models, hierarchical models, random-effects models</li>
<li><strong>Structure</strong>: Level-1 (individuals) nested within Level-2 (e.g., families), etc.</li>
<li><strong>Applications</strong>: Health outcomes vary across multiple levels (e.g., person, family, neighborhood)</li>
</ul>
<section id="example-alcohol-abuse" class="level3">
<h3 class="anchored" data-anchor-id="example-alcohol-abuse">Example: Alcohol Abuse</h3>
<table class="table">
<thead>
<tr class="header">
<th>Level</th>
<th>Example</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td>Person</td>
<td>Genetic predisposition</td>
</tr>
<tr class="even">
<td>Family</td>
<td>Household alcohol use</td>
</tr>
<tr class="odd">
<td>Neighborhood</td>
<td>Access to bars</td>
</tr>
<tr class="even">
<td>Society</td>
<td>Legal/policy regulation</td>
</tr>
</tbody>
</table>
</section>
<section id="notation-for-multilevel-data" class="level3">
<h3 class="anchored" data-anchor-id="notation-for-multilevel-data">Notation for Multilevel Data</h3>
<ul>
<li><strong>Nested levels</strong>:
<ul>
<li>Level 1: Time</li>
<li>Level 2: Person</li>
<li>Level 3: Family</li>
<li>Level 4: Neighborhood</li>
<li>Level 5: State</li>
</ul></li>
</ul>
<p>E.g.,<br>
<img src="https://latex.codecogs.com/png.latex?Y_%7Bsijkt%7D"> = outcome at time ( t ) for person ( k ), family ( j ), neighborhood ( i ), state ( s )<br>
<img src="https://latex.codecogs.com/png.latex?X_%7Bsijkt%7D"> = predictor at same unit</p>
</section>
<section id="synonyms-for-multilevel-models" class="level3">
<h3 class="anchored" data-anchor-id="synonyms-for-multilevel-models">Synonyms for Multilevel Models</h3>
<ul>
<li>Mixed effects model</li>
<li>Random effects model</li>
<li>Hierarchical model</li>
<li>Growth model</li>
<li>Meta-analysis (special case)</li>
</ul>
<hr>
</section>
</section>
<section id="longitudinal-multilevel" class="level2">
<h2 class="anchored" data-anchor-id="longitudinal-multilevel">Longitudinal = Multilevel</h2>
<ul>
<li>Longitudinal data = <strong>Time nested within individuals</strong></li>
<li>Repeated measures are <strong>correlated</strong>, requiring special models</li>
</ul>
<hr>
</section>
<section id="core-questions" class="level2">
<h2 class="anchored" data-anchor-id="core-questions">Core Questions</h2>
<ol type="1">
<li>What questions do multilevel/longitudinal models answer?
<ul>
<li>Population averages</li>
<li>Individual differences</li>
<li>Relationships at multiple levels</li>
</ul></li>
<li>Why are repeated measures correlated?
<ul>
<li>Same person → similar outcomes across time</li>
</ul></li>
<li>What happens if we ignore that correlation?
<ul>
<li>Invalid inferences</li>
<li>Poor confidence intervals</li>
<li>Biased results (especially with missing data)</li>
</ul></li>
<li>How can we model it correctly?
<ul>
<li>Use appropriate models (marginal or mixed)</li>
</ul></li>
</ol>
<hr>
</section>
<section id="types-of-models" class="level2">
<h2 class="anchored" data-anchor-id="types-of-models">Types of Models</h2>
<section id="population-average-marginal-models" class="level3">
<h3 class="anchored" data-anchor-id="population-average-marginal-models">Population-Average (Marginal) Models</h3>
<ul>
<li>Focus on overall mean structure</li>
<li>Model mean, variance, and correlation separately</li>
</ul>
</section>
<section id="subject-specific-mixed-effects-models" class="level3">
<h3 class="anchored" data-anchor-id="subject-specific-mixed-effects-models">Subject-Specific (Mixed Effects) Models</h3>
<ul>
<li>Include random effects</li>
<li>Model individual-level variation</li>
<li>Capture within-subject correlation directly</li>
</ul>
<hr>
</section>
</section>
<section id="what-happens-if-we-ignore-correlation" class="level2">
<h2 class="anchored" data-anchor-id="what-happens-if-we-ignore-correlation">What Happens If We Ignore Correlation?</h2>
<ul>
<li>Biased or misleading inferences</li>
<li>Under/overestimated standard errors</li>
<li>Less statistical power</li>
<li>Invalid estimates, especially with missing data</li>
</ul>
<hr>
</section>
<section id="key-takeaways" class="level2">
<h2 class="anchored" data-anchor-id="key-takeaways">Key Takeaways</h2>
<ul>
<li>Multilevel models account for:
<ul>
<li>Nested structure in real-world data</li>
<li>Correlation within clusters or individuals</li>
</ul></li>
<li>Essential for accurate inference in public health and longitudinal research</li>
<li>Ignoring structure leads to:
<ul>
<li>Bias</li>
<li>Invalid inference</li>
<li>Poor policy or scientific decisions</li>
</ul></li>
</ul>
<hr>
</section>
<section id="references" class="level2">
<h2 class="anchored" data-anchor-id="references">References</h2>
<ul>
<li>Diggle et al.&nbsp;(2002). <em>Analysis of Longitudinal Data</em><br>
</li>
<li>Rabe-Hesketh &amp; Skrondal. <em>Multilevel and Longitudinal Modeling Using Stata</em><br>
</li>
<li>Lecture slides by Dr.&nbsp;Zheyu Wang and Dr.&nbsp;Ji Soo Kim</li>
</ul>


</section>

 ]]></description>
  <category>Biostat 655</category>
  <guid>https://erhchihsu.github.io/posts/2025-06-04-/</guid>
  <pubDate>Wed, 04 Jun 2025 04:00:00 GMT</pubDate>
</item>
<item>
  <title></title>
  <dc:creator>Erh-Chi Hsu</dc:creator>
  <link>https://erhchihsu.github.io/posts/2025-05-29-complex_survey_101/</link>
  <description><![CDATA[ undefined ]]></description>
  <category>PFRH 712</category>
  <guid>https://erhchihsu.github.io/posts/2025-05-29-complex_survey_101/</guid>
  <pubDate>Thu, 29 May 2025 04:00:00 GMT</pubDate>
</item>
</channel>
</rss>
