Linear Mixed Models I: LMMs with pymer4

Linear Mixed Models I: LMMs with `pymer4`#

Important: Before running this notebook open a terminal, activate the 201b environment, and update the pymer4 package:

Open terminal
conda activate 201b
pip install git+https://github.com/ejolly/pymer4.git --no-deps

Learning Goals#

Welcome to the last week of class and one of Eshin’s favorite topics: mixed models.
In the lab notebooks for week 6 we spent a lot of time understanding univariate and multiple regression via ols() using the statsmodels library. In this notebook we’re going to meet pymer4 a new Python library (written by Eshin) that allows us to estimate linear mixed effects models by interfacing with the gold standard R library lme4.

We won’t be covering the full functionality of pymer4 as it’s beyond the scope of this course.
Instead we’ll cover the fundamental of LMMs by exploring Simpson’s Paradox and the sleepstudy data we discussed in class.
We’ll focus on:

Understanding why we used LMMs vs regular regression
Exporing Simpson’s Paradox
Comparing pooled, unpooled, and partially pooled data aggregation approaches
Comparing different random-effects specifications for LMMs
Interpreting model output

Slides for reference#

Linear Mixed Models I (slides)

Instructor’s Note#

Linear mixed models are vast topic and we can’t possible cover everything you need to know in the this course (especially with the limited time we have left). Instead please take note of the following excellent tutorial resources that will be very handy as you’re applying LMMs to your own (potentially more complicated) experimental designs and data. These links are also available on the course website, under the Overview page for Week 10.

Most of the materials you’ll find linked below (and generally when you’re googling online), will use the lme4 library in R. You can still follow these tutorials with a little patience converting the R code to the way that pymer4 does things in Python. We’ve included a table below for the most notable differences you’re likely to encounter.

Excellent Tutorial Resources#

Resources:

R vs Python#

Note: while the library we’ve been using for ols(), statsmodels does include some tools to estimate LMMs, we do not recommend using it for 2 reasons:

It can’t handle all the flavors of random effects offered by lme4 in R
It calculates p-values in an incredibly liberal way resulting in much higher Type I errors (false positive)

Python	R	Description
`Lmer()`	`lmer()`	create a model
`model.fit()` and `model.summary()`	`summary(model)`	estimate model and get summary
`model.anova()`	`anova(model)`	ANOVA style F-Tests; in R you have to setup contrasts properly; in `pymer4` you can use the `force_orthogonal=True` argument to do this automatically
`model.fit(factors={'cond': {'A': 1, 'B': -1}})`	`contrasts()`	setup custom categorical variable contrasts
`lrt(model1, model2, model3,...)`	`anova(model1, model2, model3,...)`	Compare models to see if fixed/random effects are worth it

Key attributes/methods after fitting

Python	R	Description
`model.coef`	`summary(model)`	Level 2 Fixed effects estimates & t-stats; `pymer4` also includes CIs & p-values; you need to use the `lmerTest` library in R to get these
`model.ranef`	`ranef(model)`	Level 1 Random effects (offsets from Level 2 Fixed effects) per cluster variable (e.g. intercepts & slope offsets per Subject)
`model.fixef`	`coef(model)`	Level 1 Fixed Effect = Level 1 Random + Level 1 Fixed effects, (e.g. intercepts & slopes per Subject)
`model.post_hoc()`	`emmeans` library in R	post-hoc comparisons between levels of categorical variables

# Import some libraries we typically use
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
import polars as pl
from polars import col
sns.set_context('notebook')

Background#

GLM Assumptions#

Let’s remind ourselves of the core assumption of the General-Linear-Model: Idependent and Identitically Distributed errors (i.i.d)

This overall idea captures a few critical pieces that we’ve encountered in different forms:

Additivity & Linearity: assumes that the outcome \(y\) is a linear function of separate predictors \(\beta_0 + \beta_1X_1 +...\)
Normality: assumes that the residuals (\(y - \hat{y}\)) are normally distributed. It’s actually okay if the predictors
Homoscedasticity: assumes that the variance of our residuals doesn’t change as function of our predictors; we shouldn’t be getting more or less wrong (\(y - \hat{y}\)) depending upon what value our predictor \(X\) takes on; this matters a lot when we are using categorical predictors and calculating ANOVA statistics
No perfect multi-collinearity: assumes that our predictors are not just linear combinations of each other, otherwise we can’t figure out what the “unique variance” each one contributes to explaining \(y\)! and the outcome are non-normal, so long as the residuals are normal.

Assumption	How to Notice	Effect on Model	Example Data
Linearity	Curved relationships in data plot	Poor model fit	Perceptual measurements, learning curves
Homoscedasticity	Residuals fan out in residual plot	Incorrect standard errors	Data with a huge range, e.g. 1-100000
Normality of Residuals	Skewed residuals	Invalid statistical tests	Reaction times, income
Multicollinearity	Highly correlated predictors	Inflated standard errors	Z-score Standardization

The last assumption we haven’t yet explore in previous notebooks is independence:

Independence of errors: assumes our residuals don’t depend upon each other; this gets violated when you have repeated-measures, time-series, geospatial, or multi-level

Violating Independence and Simpson’s Paradox#

Taking dependence in the data into account is extremely important. The Simpson’s paradox is an instructive example for what can go wrong when we ignore the dependence in the data.

Let’s start by loading some simulating some data to demonstrate the paradox.

df_simpson = pl.read_csv('data/simpson.csv')
df_simpson.head()

shape: (5, 4)

participant	intercept	x	y
i64	f64	f64	f64
1	-9.103987	0.022867	-9.45449
1	-9.103987	0.025569	-9.265862
1	-9.103987	0.032425	-9.687238
1	-9.103987	0.032724	-9.464708
1	-9.103987	0.036451	-9.37228

Let’s visualize the overall relationship between x and y with a simple linear model using sns.lmplot

sns.lmplot(data=df_simpson.to_pandas(), x='x', y='y', aspect=1.5, line_kws={'color': 'black'}, scatter_kws={'color': 'black'})

<seaborn.axisgrid.FacetGrid at 0x151bbbf50>

../../_images/072ad0de82816a007b67221b58847434b6e16e328aef9cf3fb0be280a73cef71.png

We can estimate the model using ols() and see that the overall relationship is positive and significant

from statsmodels.formula.api import ols

print(ols('y ~ x', data=df_simpson.to_pandas()).fit().summary(slim=True))

                            OLS Regression Results                            
==============================================================================
Dep. Variable:                      y   R-squared:                       0.778
Model:                            OLS   Adj. R-squared:                  0.777
No. Observations:                 200   F-statistic:                     693.0
Covariance Type:            nonrobust   Prob (F-statistic):           1.37e-66
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept     -8.9538      0.199    -44.949      0.000      -9.347      -8.561
x              9.0688      0.344     26.325      0.000       8.389       9.748
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

Let’s take another look at the data using different colors for different participants

grid = sns.FacetGrid(data=df_simpson.to_pandas(), height=5, aspect=1.5)
grid.map(sns.regplot, 'x', 'y', color='black')
grid.map_dataframe(sns.scatterplot, 'x', 'y', hue='participant', data=df_simpson.to_pandas(), palette='pastel', edgecolor='gray');

../../_images/11cfa87a027bbb614a5f0a93c2d1a25df5b0ab12c57270743122fd8eb5dc5136.png

And fit a different regression line per particpant:

grid = sns.lmplot(data=df_simpson.to_pandas(), x='x', y='y', hue='participant', aspect=1.5, scatter_kws={'edgecolor': 'gray'}, palette='pastel', line_kws={'color': 'black'}, legend=False)

../../_images/11d3f0b77851133418b2e3231f4056f8006585031ca11577a506629b039ecd22.png

What this plot shows is that for almost all individual participants, the relationship between x and y is negative!

Let’s fit an LMM with random intercepts using pymer4

We can do so by create a model with a formula and some data using the Lmer Python class similar to ols() we’ve been using. We can then call the .fit() method on the model with summarize=True to get estimated parameters.

This is different from ols() where, .fit() returned a results object that we were using for things like .summary() and .parameters.

In pymer4 everything is contained in the model you create with Lmer

We’ll explore this more in the next section, for now let’s take a look at the estimated fixed effect of x

from pymer4.models import Lmer

# Create model
model = Lmer('y ~ x + (x | participant)', data=df_simpson.to_pandas())

# Call fit() to estimate parameters; doesn't return anything!
model.fit() 

# Fixed effects and inferential stats
model.coefs

boundary (singular) fit: see help('isSingular')

	Estimate	2.5_ci	97.5_ci	SE	DF	T-stat	P-val	Sig
(Intercept)	-0.760196	-3.403647	1.883256	1.348724	13.66477	-0.563640	5.821312e-01
x	-6.519442	-8.773135	-4.265749	1.149864	92.86580	-5.669748	1.603694e-07	***

As we can see, the fixed effect for x is now negative!
We can plot each participant’s random effect to see this:

# Scatter plot per subject
grid = sns.FacetGrid(data=model.data, height=5, aspect=1.5)
grid.map_dataframe(sns.scatterplot, 'x', 'y', hue='participant', palette='pastel', edgecolor='gray', alpha=0.5);
grid.map_dataframe(sns.lineplot, 'x', 'fits', hue='participant', estimator=None, linewidth=3, color='black');

../../_images/c5183805332672ab1594574c912ea7131500a8095c8b5cfbdbb1b6dc542c3164.png

Different ways to handle dependent data#

Simpson’s paradox shows us that if we don’t properly account for violations for independence we run the risk of making incorrect inferences.

In class we discussed 3 approaches for handing dependent data by aggregation:

Complete pooling: just a single global regression ignoring repeated measures
No pooling: a separate regression per cluster of repeated measures
Partial pooling: a linear mixed effects model

Note: If you want a more causal, accessible (10-15m) reading about this topic, feel free to checkout this older blog post Eshin wrote about repeated measures

Let’s explore this using the Sleep Study data we looked at in class.

This dataset is included with lme4 in R with the following details:

The average reaction time per day (in milliseconds) for subjects in a sleep deprivation study.

Days 0-1 were adaptation and training (T1/T2), day 2 was baseline (B); sleep deprivation started after day 2.

These data are from the study described in Belenky et al. (2003), for the most sleep-deprived group (3 hours time-in-bed) and for the first 10 days of the study, up to the recovery period. The original study analyzed speed (1/(reaction time)) and treated day as a categorical rather than a continuous predictor.

The original paper if you’re interested is:

Gregory Belenky, Nancy J. Wesensten, David R. Thorne, Maria L. Thomas, Helen C. Sing, Daniel P. Redmond, Michael B. Russo and Thomas J. Balkin (2003) Patterns of performance degradation and restoration during sleep restriction and subsequent recovery: a sleep dose-response study. Journal of Sleep Research 12, 1-12.

Let’s load up the data and take a look at the columns it has:

Variable	Description
Subject	Unique Subject ID
Limit	Number of Days of sleep deprivation
Reaction	Subject’s Reaction time in ms

df = pl.read_csv('data/sleep.csv')
df.head()

shape: (5, 3)

Subject	Days	Reaction
i64	i64	f64
308	0	249.56
308	1	258.7047
308	2	250.8006
308	3	321.4398
308	4	356.8519

Mini-challenge: How many days of sleep deprivation are there per subject?

# Solution
df.group_by('Subject').len()['len'].unique()

shape: (1,)

len
u32
10

Mini-challenge: How many unique subjects are there?

# Solution
df['Subject'].n_unique()

Let’s start by visualizing the data. We’ll create a FacetGrid using seaborn and plot each participant’s data on a separate facet

Note: we’re not currently using the full dataset with 2 participants who had incomplete observations

# Scatter plot per subject
grid = sns.FacetGrid(data=df.to_pandas(), col='Subject', col_wrap=7, height=2)
grid.map(sns.scatterplot, 'Days', 'Reaction')

# Labels
grid.set_axis_labels('','')
grid.figure.supxlabel('Days of Sleep Deprivation', fontsize=20)
grid.figure.supylabel('Average Reaction Time (ms)', x=-.01, fontsize=20);

../../_images/fa4f61aa0a6c437ada8aa746b63b89104ba32ffa1187e2bc5c7e06c5c3307a7f.png

1) Complete Pooling: Good ‘ole OLS#

Complete pooling ignores the fact that we have multiple observations per Subject and just fits a regular ‘ole regression to all the data

Mini-challenge: Use seaborn to plot the global regression line across all data points

# Solution
grid = sns.lmplot(data=df.to_pandas(), x='Days', y='Reaction')
grid.set_axis_labels("Days of Sleep Deprivation", "Average Reaction Time (ms)");

../../_images/43f5fa73d475f4e9e0ce3d212df3494abe2672c2f2de787e144afe201e4a47ee.png

Min-challenge: Use ols to estimate a complete pooling model.
Save the output of .fit() to a variable called model_cp

# Solution
model_cp = ols('Reaction ~ Days', data=df.to_pandas()).fit()

If we examine the parameters of the model you fit, we see how just a single intercept and slope is estimate across all subjects.

This is as if all the data came from 1 subject!

# Note if this gives you an error, you may have have not saved the output of `.fit()` to a variable called `model_cp`
model_cp.params

Intercept    251.405105
Days          10.467286
dtype: float64

Let’s visualize the regression line for each subject. We’ll define a helper function below and add it to our FacetGrid

# Helper function
def plot_cp(x, **kwargs):
    """Plot a line from slope and intercept of model_cp"""
    intercept = model_cp.params['Intercept']
    slope = model_cp.params['Days']
    y_vals = intercept + slope * x
    plt.plot(x, y_vals, '-', color='k')

# Scatter plot per subject
grid = sns.FacetGrid(data=df.to_pandas(), col='Subject', col_wrap=7, height=2)
grid.map(sns.scatterplot, 'Days', 'Reaction')

# Add regression line to each subplot
# It's the same for all subjects
grid.map(plot_cp,'Days') 

# Labels
grid.set_axis_labels('','')
grid.figure.supxlabel('Days of Sleep Deprivation', fontsize=20)
grid.figure.supylabel('Average Reaction Time (ms)', x=-.01, fontsize=20);

../../_images/769283d343c06be6d1d1a3b27fd200a7dbb50765541487d0a0add4eb9db1bf36.png

Notice how each Subject has the same regression line

2) No Pooling: OLS per Subject#

If we don’t pool our data at all, it’s like estimating a separate ols per Subject

pymer4 includes a special model called Lm2 that does exactly this. It works just like ols() but also takes group argument for the variable we should separate the data by and run separate regressions:

# Import it
from pymer4.models import Lm2

# Create model
model_np = Lm2('Reaction ~ Days', group='Subject', data=df.to_pandas())

# Fit it
model_np.fit()

Every model that pymer4 estimates includes a .data attribute with the original data and the fits and residuals from the model:

Note: data are stored as Dataframe using the pandas library NOT polars that we’ve been using. Remember you can always convert it into a polars Dataframe by wrapping it in pl.DataFrame(). We don’t need to do that at the moment as both libraries have a .head() method

model_np.data.head()

	Subject	Days	Reaction	residuals	fits
0	308	0	249.5600	5.367331	244.192669
1	308	1	258.7047	-7.252672	265.957372
2	308	2	250.8006	-36.921474	287.722074
3	308	3	321.4398	11.953024	309.486776
4	308	4	356.8519	25.600421	331.251479

We can use this DataFrame to add each Subject’s fit to their scatterplot

# Scatter plot per subject
grid = sns.FacetGrid(data=model_np.data, col='Subject', col_wrap=7, height=2)
grid.map(sns.scatterplot, 'Days', 'Reaction')

# Add regression line to each subplot
# Now each subject has their own regression line
grid.map(sns.lineplot, 'Days', 'fits')

# Labels
grid.set_axis_labels('','')
grid.figure.supxlabel('Days of Sleep Deprivation', fontsize=20)
grid.figure.supylabel('Average Reaction Time (ms)', x=-.01, fontsize=20);

../../_images/afa8c0d2335f7b865b17ab64f8bb1a5f58a6d01c0b3f0b0ef4a79eaecbf86a53.png

Mini-challenge: Compare the figure we just made to the one from our complete pooling model. Write a sentence or two on how they’re the same or different

Your response here

We can also do the same thing by writing a for loop and using the ols() function that we’re familiar with, but it’s a bit more involved:

# To store each subject's parameters and fits/residuals
params = []
fits_resids = []

# Loop over each subject
for subjectID in df['Subject'].unique():
    # Filter data for current subject
    subject_data = df.filter(col('Subject') == subjectID).to_pandas()

    # Fit model
    subject_model = ols('Reaction ~ Days', data=subject_data).fit()

    # Grab subject estimates, add subject ID, and store it
    subject_params = subject_model.params.to_dict()
    subject_params['Subject'] = subjectID
    params.append(subject_params)

    # Grab model fits and residuals and store them
    subject_fits_resids = pl.DataFrame({
        'Subject': np.repeat(subjectID, len(subject_data)),
        'Days': subject_data['Days'].to_numpy(),
        'fits': subject_model.fittedvalues.to_numpy().astype(float), 
        'resid': subject_model.resid.to_numpy().astype(float)})

    fits_resids.append(subject_fits_resids)

# Combine into DataFrames
params = pl.DataFrame(params)[['Subject', 'Intercept', 'Days']]
fits_resids = pl.concat(fits_resids)

# Combine with original data
df_np = df.join(fits_resids, on=['Subject','Days'])

# Scatter plot per subject
grid = sns.FacetGrid(data=df_np.to_pandas(), col='Subject', col_wrap=7, height=2)
grid.map(sns.scatterplot, 'Days', 'Reaction')

# Add regression line to each subplot
# Each subject has their own regression line
grid.map(sns.lineplot, 'Days', 'fits')

# Labels
grid.set_axis_labels('','')
grid.figure.supxlabel('Days of Sleep Deprivation', fontsize=20)
grid.figure.supylabel('Average Reaction Time (ms)', x=-.01, fontsize=20);

3) Partial Pooling: Linear Mixed Models#

Partial pooling models are the best of both worlds - intuitively you can think about them is half-way between our complete and unpooled approaches.

Instead of ignore or splitting up our dataset, we explicitly model the dependence using 2 components:

Fixed effects which capture the general effect we’re interested in: \(\text{Reaction} \sim \text{Days}\)
Random effects which capture the variance around the fixed effects; specifically how each Subject’s intercept and/or slope deviates from the fixed effects

Linear Mixed Models: Meet `Lmer`#

To estimate an LMM we can use the Lmer() class from pymer4. We provide it a formula and data like before, but this time we also specify the random effects we want to estimate using the following syntax

Random Effects Syntax#

Key Attributes#

Lmer models have the following attributes that make it easier to work with model results:

.coefs - summary output table; summary(model) in R
.fixef - cluster level estimates; coef(model) in R
.ranef - cluster level deviances; ranef(model) in R
.design_matrix - model design matrix
.data - original data augmented with columns of fits and residuals

Random Intercepts Only#

Let’s estimate an LMM with random intercepts per Subject

Providing summarize=True to .fit() will automatically print out an R like summary:

from pymer4.models import Lmer

model_pp_i = Lmer('Reaction ~ Days + (1 | Subject)', data=df.to_pandas())
model_pp_i.fit(summarize=True)

Linear mixed model fit by REML [’lmerMod’]
Formula: Reaction~Days+(1|Subject)

Family: gaussian	 Inference: parametric

Number of observations: 180	 Groups: {'Subject': 18.0}

Log-likelihood: -893.233 	 AIC: 1794.465

Random effects:

                 Name       Var     Std
Subject   (Intercept)  1378.179  37.124
Residual                960.457  30.991

No random effect correlations specified

Fixed effects:

	Estimate	2.5_ci	97.5_ci	SE	DF	T-stat	P-val	Sig
(Intercept)	251.405	232.302	270.508	9.747	22.81	25.794	0.0	***
Days	10.467	8.891	12.044	0.804	161.00	13.015	0.0	***

This outputs lots of information about the model, as well as a classic summary table with t-stats, CIs, and p-values for fixed effects:

Formula: model formula you used
Family: the type of distribution we’re assuming
Inference: how we’re calculating p-values
Log-likelihood/AIC: measures of model fit that we can use to compare to other models
Random effects: the variance of the rfx you specified (only intercept in this case)

Using the .coefs attribute we can just look at the fixed effects summary.

You’ll notice that our degrees-of-freedom are no longer nice round numbers, but include decimal places. That’s because pymer4 uses both the lme4 R library and the lmerTest R library which calculate p-values using Satterthwaite approximated degrees of freedom.

model_pp_i.coefs

	Estimate	2.5_ci	97.5_ci	SE	DF	T-stat	P-val	Sig
(Intercept)	251.405105	232.301892	270.508318	9.746716	22.810199	25.793826	2.241351e-18	***
Days	10.467286	8.891041	12.043531	0.804221	161.000000	13.015428	6.412601e-27	***

Comparing models#

We can also compare this to the model-comparison approach we’ve learned about before

To do this in pymer4 we use the lrt() function which is like the anova_lm() function that we’re familiar with. Just pass it 1 or models and it will test whether the difference in parameters is worth it

from pymer4.stats import lrt

# Drop the Days term
model_pp_null = Lmer('Reaction ~ 1 + (1 | Subject)', data=df.to_pandas())
model_pp_null.fit()

# Compare models to see if including Days as a fixed effect is worth it
lrt(model_pp_null, model_pp_i)

refitting model(s) with ML (instead of REML)

	model	npar	AIC	BIC	log-likelihood	deviance	Chisq	Df	P-val	Sig
0	Reaction~1+(1\|Subject)	3	1916.541058	1926.119929	-955.270529	1910.541058
1	Reaction~Days+(1\|Subject)	4	1802.078643	1814.850470	-897.039322	1794.078643	116.462415	1.0	0.0	***

Inspecting coefficients#

We can inspect the random effects using .ranef and .fixef

.ranef: Subject level deviances from .coef
.fixef: Subject level estimates = .coef + .ranef

Below we see the 20 deviances for the fixed effect intercept term: 1 per Subject

# intercept deviances per subject
model_pp_i.ranef

	X.Intercept.
308	40.783710
309	-77.849554
310	-63.108567
330	4.406442
331	10.216189
332	8.221238
333	16.500494
334	-2.996981
335	-45.282127
337	72.182686
349	-21.196249
350	14.111363
351	-7.862221
352	36.378425
369	7.036381
370	-6.362703
371	-3.294273
372	18.115747

And if we combine these with our fixed effects we can calculate our 20 different parameter estimates per Subject, similar to our unpooled approach

Notice that because we only estimated random intercepts, all subjects have the same slope for Days!

# intercepts and slopes per subject
model_pp_i.fixef

	(Intercept)	Days
308	292.188815	10.467286
309	173.555551	10.467286
310	188.296537	10.467286
330	255.811547	10.467286
331	261.621294	10.467286
332	259.626343	10.467286
333	267.905599	10.467286
334	248.408124	10.467286
335	206.122978	10.467286
337	323.587791	10.467286
349	230.208856	10.467286
350	265.516468	10.467286
351	243.542884	10.467286
352	287.783530	10.467286
369	258.441486	10.467286
370	245.042402	10.467286
371	248.110832	10.467286
372	269.520852	10.467286

Challenge#

Using the examples you seen so far try to complete the following challenges

Random Slopes Only#

Reference the Random Effects Syntax table above to estimate a model that only includes random slopes per Subject

# Solution
model_s = Lmer('Reaction ~ Days + (0 + Days | Subject)', data=df.to_pandas())
model_s.fit()

Using the .data attribute of the model and the examples above plot each Subject’s data on a different facet along with their estimated regression lines

# Solution
# Scatter plot per subject
grid = sns.FacetGrid(data=model_s.data, col='Subject', col_wrap=7, height=2)
grid.map(sns.scatterplot, 'Days', 'Reaction')

# Add regression line to each subplot
grid.map(sns.lineplot, 'Days', 'fits')

# Labels
grid.set_axis_labels('','')
grid.figure.supxlabel('Days of Sleep Deprivation', fontsize=20)
grid.figure.supylabel('Average Reaction Time (ms)', x=-.01, fontsize=20);

../../_images/7159175cb5322764af1f896a9d28c611a158bcda4a9afe0bdb39263ee1237cfb.png

Random Intercepts & Slopes#

Reference the Random Effects Syntax table above to estimate a model that includes correlated random intercepts & slopes per Subject

# Solution
model_is = Lmer('Reaction ~ Days + (1 + Days | Subject)', data=df.to_pandas())
model_is.fit()