Simulation on adjusting for confounders in a mixed model

A brief tidyverse-style simulation to figure out what happens, if I don’t include varying slopes in a model.

Ruben C. Arslan https://rubenarslan.github.io
2024-05-24

I recently tried to a write a very simple simulation script in tidyverse to demonstrate simulation-based power analysis for students. Today, I had occasion to adapt the script myself, because a student asked me whether she needed to include varying slopes for adjusted confounders in a multilevel model. My first thought was “yes, that’s better” but I was a little scared of the impact on model complexity, because it’s a large dataset.

So, I ran a little simulation and found out I was wrong to expect bias. To be frank, this runs counter to my intuition, so I’m hoping that posting this will either make someone point out an error in my simulation or help me develop the right intuition.

library(tidyverse)
library(lmerTest)
library(broom.mixed)

theme_set(theme_bw())
lmer <- function(...) tryCatch({ suppressMessages(suppressWarnings(lmerTest::lmer(...))) }, 
                              error = function(e) NULL)

alpha = 0.05 # false positive rate

Set up simulation scenarios

The expand_grid function makes this easy. For a power analysis, we could also calculate the budget and filter out scenarios we cannot afford (or scenarios that make no sense a priori).

Scenarios = expand_grid(
    # Number of countries
    N = c(20, 30, 40),
    # Number of days per person
    obs_per_country = c(1000),
    # Effect of female parliamentarians
    b_fem_parl = c(-0.2),
    # Does the effect vary between countries
    sd_fem_parl = c(0.2),
    # Effect of GDP
    b_GDP = c(1),
    # Does the effect vary between countries
    sd_GDP = c(0, 1),
    b_fem_parl_GDP = c(1),
    sd_fem_parl_GDP = c(0, 1),
    # Simulate each scenario 200 times
    Simulation = 1:50)
generate_data_and_fit_model <- function(N, obs_per_country, 
                                        b_fem_parl, sd_fem_parl, 
                                        b_GDP, sd_GDP, 
                                        b_fem_parl_GDP, sd_fem_parl_GDP, 
                                        Simulation) {
  
  # Store scenario's parameters
  Parameters <- environment() %>% as.list() %>% as_tibble()
  
  # Generate Dataset
  Days = tibble(
    # A number that identifies each person, repeated days_per_person times
    country = 1:N %>% rep(each = obs_per_country),
    # N draws from the normal distribution with mean = 0, SD = 1
    general_wellbeing = rnorm(N, mean = 0, sd = 1)[country],
    # The effect of GDP varies from country to country
    GDP_i = rnorm(N, mean = b_GDP, sd = sd_GDP)[country],
    # the effect of fem_parl also varies from country to country
    fem_parl_i = rnorm(N, mean = b_fem_parl, sd = sd_fem_parl)[country],
    # the effect of GDP on female parliamentarians also varies from country to country 
    fem_parl_GDP_i = rnorm(N, mean = b_fem_parl_GDP, sd = sd_fem_parl_GDP)[country],
    # within country predictors
    GDP = rnorm(N * obs_per_country),
    fem_parl = rnorm(N * obs_per_country) + fem_parl_GDP_i * GDP,
    # today's well being results from stable country differences
    # the GDP effect, the female parliamentarians effect, and a residual term
    well_being = general_wellbeing + fem_parl_i * fem_parl
                    + GDP_i * GDP + rnorm(N * obs_per_country)
  )

  # Our three statistical models predict well-being
  models <- list(
    Model_varFP = lmer(well_being ~ fem_parl + GDP + (1 + fem_parl | country), data = Days),
    Model_varFP_GDP = lmer(well_being ~ fem_parl + GDP + (1 + fem_parl + GDP || country), data = Days),
    Model_varFP_GDP_cor = lmer(well_being ~ fem_parl + GDP + (1 + fem_parl + GDP | country), data = Days)
   )
  
  # store their output in tidy form, together with the scenario parameters
  models %>% map(~ broom.mixed::tidy(., conf.int = TRUE)) %>% 
    bind_rows(.id = "Model") %>% 
    bind_cols(Parameters, .)
}

# To try this out and debug the script
# debug(generate_data_and_fit_model)

# Scenarios %>% slice(1) %>% pmap(generate_data_and_fit_model) %>% bind_rows() %>% filter(term == "fem_parl") %>% View()

Simulate

We loop over all scenarios, generate the data, fit the models, and store the results.

SimulationResults = Scenarios %>%
  pmap(generate_data_and_fit_model, .progress = T) %>%
  # Combine everything into a data frame
  bind_rows()

This call above wasn’t parallelised, but could easily be using furrr instead of purrr.

library(furrr)
plan(multisession)
SimulationResults = Scenarios %>%
  furrr::future_pmap(generate_data_and_fit_model, .progress = T,
                     .options = furrr::furrr_options(seed = 20191005)) %>%
  # Combine everything into a data frame
  bind_rows()

write_rds(SimulationResults, "SimulationResults3.rds")

If our goal was a power analysis, we could check how often the effect is significant.

SimulationResults <- SimulationResults %>%
  mutate(significant_two_sided = p.value < alpha)

Here, my goal is finding out whether my estimate is likely to be biased away from the true simulated value.

SimulationResults <- SimulationResults %>%
  mutate(bias = if_else(term == "fem_parl", b_fem_parl - estimate, NA_real_))

We summarise across the scenarios and models to average power, estimates, standard error, and bias.

SummarisedResults <- SimulationResults %>%
  filter(term == "fem_parl") %>% 
  group_by(across(c(N:Model,-Simulation))) %>%
  summarise(
    power = mean(p.value < alpha),
    bias = mean(b_fem_parl - estimate),
    mean(std.error),
    mean(estimate),
    Simulations = n()
  ) %>%
  arrange(desc(abs(bias)))

I expected bias to be bigger if our model does not match the data-generating process (i.e., Model_varFP but the effect of GDP on FP and WB varied). This was not the case, though standard errors increased. I am little surprised by this, but I’ve learned not to trust my intuition on these topics.

SimulationResults %>% 
  filter(term == "fem_parl") %>% 
  filter(b_fem_parl != 0) %>% 
  group_by(sd_GDP, sd_fem_parl_GDP, Model, b_fem_parl) %>% 
  summarise(bias = mean(bias), 
            est = mean(estimate, na.rm = T), 
            std.error = mean(std.error), 
            n()) %>% 
  kableExtra::kbl()
sd_GDP sd_fem_parl_GDP Model b_fem_parl bias est std.error n()
0 0 Model_varFP -0.2 -0.0071441 -0.1928559 0.0378787 150
0 0 Model_varFP_GDP -0.2 -0.0071438 -0.1928562 0.0378704 150
0 0 Model_varFP_GDP_cor -0.2 -0.0071425 -0.1928575 0.0379609 150
0 1 Model_varFP -0.2 0.0020667 -0.2020667 0.0372315 150
0 1 Model_varFP_GDP -0.2 0.0021055 -0.2021055 0.0372255 150
0 1 Model_varFP_GDP_cor -0.2 0.0020659 -0.2020659 0.0372329 150
1 0 Model_varFP -0.2 0.0016501 -0.2016501 0.1001414 150
1 0 Model_varFP_GDP -0.2 0.0025946 -0.2025946 0.0376310 150
1 0 Model_varFP_GDP_cor -0.2 0.0025944 -0.2025944 0.0376307 150
1 1 Model_varFP -0.2 -0.0016691 -0.1983309 0.0840834 150
1 1 Model_varFP_GDP -0.2 -0.0043495 -0.1956505 0.0378779 150
1 1 Model_varFP_GDP_cor -0.2 -0.0043470 -0.1956530 0.0378743 150

Corrections

If you see mistakes or want to suggest changes, please create an issue on the source repository.

Reuse

Text and figures are licensed under Creative Commons Attribution CC BY 4.0. Source code is available at https://github.com/rubenarslan/rubenarslan.github.io, unless otherwise noted. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".

Citation

For attribution, please cite this work as

Arslan (2024, May 24). One lives only to make blunders: Simulation on adjusting for confounders in a mixed model. Retrieved from https://rubenarslan.github.io/posts/2024-05-24-simulation-on-adjusting-for-confounders-in-a-mixed-model/

BibTeX citation

@misc{arslan2024simulation,
  author = {Arslan, Ruben C.},
  title = {One lives only to make blunders: Simulation on adjusting for confounders in a mixed model},
  url = {https://rubenarslan.github.io/posts/2024-05-24-simulation-on-adjusting-for-confounders-in-a-mixed-model/},
  year = {2024}
}