Simulating several models and their ability to recover within-subject and between-subject effects on scale
Recently, I got mad at a ceiling.
library(tweetrmd)
tweet_embed("https://twitter.com/rubenarslan/status/1632170569448685569")
My new personality? I'm going to lean into being mad about low ceilings in happiness measures and elsewhere. pic.twitter.com/6Bu19pn2rm
— Ruben C. Arslan (@rubenarslan) March 5, 2023
Let me explain. Kahneman & Deaton (2010) reported that happiness plateaued at an income of $75,000 or higher. Their measure of happiness was a binary question that 450,000 people answered in daily, random phone calls to ~1,000 people. Killingsworth 2021 used his experience sampling data from more than 35,000 individuals responding ~50 times to refute this. The Killingsworth study had a better happiness measure, a sliding scale from very unhappy to very happy. He found a loglinear relationship across the entire income range1. He attributed Kahneman & Deaton’s (2010) plateau finding to a ceiling effect in their measure.
Now, in a recent paper, Killingsworth, Kahneman, and Meller (2023) re-analyzed Killingsworth’s (2021) experience sampling data to determine the relationship between income and experienced well-being, happiness. In the reanalysis, KKM wanted to find out why the results differed in an “adversarial collaboration”.
To do so, they aggregated daily happiness in the Killingsworth data to derive the subject’s observed mean. Then they used a form of distributional regression, quantile regression, to predict not only the mean of happiness but also various quantiles by log(income)
. Specifically, they investigated whether the relationship to the mean plateaued above a yearly household income of $100,000. This was not the case, the relationship between income and mean happiness is loglinear above and below this threshold with near-identical slopes. It’s also a quite small relationship, log(income)
explains less than 1% of the variance in happiness.
However, they find that the relationship between log(income)
and the 15% quantile does indeed flatten after $100,000. In my view, they inappropriately interpret this as evidence that there is an unhappy group of people for which money cannot alleviate suffering (above 100k).
A better way to put it would have been: at higher log(income)
the variability in average happiness is higher. That is, some people have a lot of money and spend it on holidays, household help, and time with their loved ones. Others buy a social media platform and make themselves miserable. But even if this statistical relationship is real2 and reflects a causal relationship3, it does not imply that there is one group of people whose unhappiness cannot be alleviated with money.
Now, I’m not particularly well-acquainted with quantile regression. However, I’ve worked with multilevel location scale models a good deal in the past, a different form of distributional regression. In so doing, I’ve learned not to trust my intuition, or at least to calibrate with fake data simulation. So, I simulated to check a hunch.
My conclusion is that KKM make at least two4 assumptions that I think are unlikely to hold and which could bias their estimate:
I was fairly sure that #1 is false, because, you know, I’ve seen happiness data, and because I am a human being who knows other human beings. I did not see their happiness data, mind you, which they only shared in aggregated and rounded form upon publication of KKM 2023, so we cannot check within-subject variance homogeneity.5
I was less sure whether #2 has much impact with 51 days per subject on average.
x
and sd(y)
on the between-subject level. Especially at low levels of x, sampling error can still dominate the true sd(id)
in y
.knitr::include_graphics("multilevel-censored-location-scale-models_files/figure-html5/BS_0_8_WS_0__means-22-1.png")
knitr::include_graphics("multilevel-censored-location-scale-models_files/figure-html5/BS_0_8_WS_0__bias-24-1.png")
x
is mistaken for a weak tendency for an increase in between-subject variance at higher x
).knitr::include_graphics("multilevel-censored-location-scale-models_files/figure-html5/BS_0_WS_0_8__bias-1.png")
c(94, 95, 96, 95)
gets 95, just like a person with c(90, 100, 100, 90)
. But if we model censoring at the within-subject level, we would estimate a higher true value for the second person (my 2-part model correctly recovers the true value). Killingsworth 2021 reports only 5.5% of responses at ceiling and the relationship with the mean is quite weak, so this probably did not hit hard. Still, aggregating loses relevant information (after aggregation “less than 0.5% of people in any income group had experienced well-being equal to the response ceiling, on average”).knitr::include_graphics("multilevel-censored-location-scale-models_files/figure-html5/BS_0_8_WS_0_2_bias-24-1.png")
knitr::include_graphics("multilevel-censored-location-scale-models_files/figure-html5/BS_0_WS_0_8_2_bias-1.png")
Technical note: There are Mplus (McNeish, 2020) and R-Stan (Martin, 2022) models to estimate two-part mixed effects location-scale models that allow simultaneous estimation of scale effects at both the within- and between-subject level. However, they do not implement censored distributions. My favourite package brms
does, but does not natively permit scale effects on the between-subject level to be estimated. However, it allows us to estimate different sd(id)
for subgroups. I formed septiles on x and used these as subgroups. Doing so allowed me to get a sort of poor man’s version of a censored two-part mixed effects location-scale model. A better man would have coded it up in raw Stan. As far as I can tell, this model both gets me closer to the between-subject effect than the aggregated model and gets me an unbiased estimate of the within-subject effect.
knitr::include_graphics("multilevel-censored-location-scale-models_files/figure-html5/BS_0_8_WS_0_2_betweensds-25-1.png")
You can find my simulations (well, one round) below.
Killingsworth 2021 was a wonderful paper. I think this quantile analysis and adversarial collaboration actually muddled the matter. In German, we of course have a word for this: Verschlimmbesserung.6 How come?
I have an idea. Adversarial collaborations don’t just sound like an oxymoron. Especially with large status differences between adversaries, the interesting stuff happens behind the scenes. Without transparency, power, resources, and stubbornness can win over superior arguments.7 I think such disputes should be carried out in the open. And if you want to share data with someone you consider an adversary, may I suggest the following as insurance:
library(tweetrmd)
tweet_embed("https://twitter.com/rubenarslan/status/1631789472407859200")
If you truly treat the other researcher as an adversary, perhaps you should try medieval hostage giving. Each sends the other a random half of their data.
— Ruben C. Arslan (@rubenarslan) March 3, 2023
The holdout data and mutuality may help ensure good faith.
Here, mu(y) is a function of x, as is sigma(y) (only at the within-subject level).
fit_models <- function(b_mean, b_sd_bs, b_sd_ws, y_ceiling) {
name = paste("BS", b_sd_bs, "WS", b_sd_ws, if_else(y_ceiling<Inf,as.character(y_ceiling),""))
rmdpartials::partial("fit_models.Rmd",
name = name,
b_mean = b_mean,
b_sd_bs = b_sd_bs,
b_sd_ws = b_sd_ws,
y_ceiling = y_ceiling)
}
fit_models(
b_mean = 1, b_sd_bs = 0, b_sd_ws = 0.8, y_ceiling = Inf)
people <- people %>%
mutate(
mean_log_sd_y = -1 + b_sd_bs * x,
log_sd_y = 0 + b_sd_ws * x,
mean_y = rnorm(N, sd = exp(mean_log_sd_y)) + b_mean * x,
xQ = ntile(x, 6)
)
days <- people %>%
full_join(tibble(
id = rep(1:N, times = n_days_per_person)
), by = "id", multiple = "all") %>%
mutate(
latent_y = rnorm(n(),
mean = mean_y,
sd = exp(log_sd_y)),
y = case_when(
latent_y >= y_ceiling ~ y_ceiling,
# latent_y <= -1.5 ~ -1.5,
TRUE ~ latent_y
),
ycens = case_when(
latent_y >= y_ceiling ~ "right",
# latent_y <= -1.5 ~ "left",
TRUE ~ "none"
)
)
Percentage censored: 0.00
ggplot(days, aes(x, y)) +
geom_point(alpha = 0.3)
sel_ids <- c(43, 36, 8, 40, 88, 29, 11, 49, 84, 41, 98)
ggplot(days, aes(x, y)) +
geom_smooth(method = 'lm', color = "gray", se = F) +
geom_pointrange(stat = "summary",
fun = mean,
fun.min = function(x) { mean(x)-sd(x) },
fun.max = function(x) { mean(x)+sd(x) }, data = days %>% filter(id %in% sel_ids) ) +
geom_point(alpha = 0.3, data = days %>% filter(id %in% sel_ids) )
library(cowplot)
p1 <- days %>%
group_by(id) %>%
summarise(x = mean(x, na.rm = T),
mean_y = mean(y, na.rm = T)) %>%
ggplot(., aes(x, mean_y)) +
geom_smooth(method = 'lm', color = "gray", se = F) +
geom_point(alpha = 0.4)
p2 <- days %>%
group_by(id) %>%
summarise(x = mean(x, na.rm = T),
sd_y = sd(y, na.rm = T)) %>%
ggplot(., aes(x, sd_y)) +
geom_smooth(method = 'lm', color = "gray", se = F) +
geom_point(alpha = 0.4) +
scale_y_continuous(trans = "log", breaks = c(0.1, 0.25, 0.5, 1, 2, 4))
p3 <- people %>%
ggplot(., aes(x, mean_y)) +
geom_smooth(method = 'lm', color = "gray", se = F) +
geom_point(alpha = 0.4) +
ylab("latent mean y")
plot_grid(p1,p2,p3,ncol = 2)
m_mixed <- brm(bf(y | cens(ycens) ~ x + (1|id),
sigma ~ x), data = days, silent = 2, refresh = 0)
m_between <- brm(bf(y | cens(ycens) ~ x,
sigma ~ x), data = days %>% group_by(x, id) %>%
summarise(y = mean(y)) %>%
mutate(ycens = case_when(
y >= y_ceiling ~ "right",
TRUE ~ "none"
)), silent = 2, refresh = 0)
m_mixed_2part <- brm(bf(y | cens(ycens) ~ x + (1|gr(id, by = xQ)),
sigma ~ x), data = days, silent = 2, refresh = 0)
draws <- m_mixed_2part %>% gather_draws(`sd_.*x.*`, regex = T)
betweenhdis <- draws %>% mean_hdci(.width = .95) %>%
mutate(xQ = as.numeric(str_match(.variable, "xQ(\\d)")[,2])) %>%
left_join(people %>% group_by(xQ) %>% summarise(x = mean(x)))
m_2part_level2 <- brm(log(.value) | se(se, sigma = FALSE) ~ x, data = betweenhdis %>% mutate(se = (log(.value)-log(.lower))/2))
draws <- bind_rows(
m_mixed = m_mixed %>% gather_draws(`b_(sigma_)?x`, regex = T),
m_mixed_2part = m_mixed_2part %>% gather_draws(`b_(sigma_)?x`, regex = T),
m_2part_between = m_2part_level2 %>% gather_draws(`b_(sigma_)?x`, regex = T) %>% mutate(.variable = "b_sigma_x"),
m_between = m_between %>% gather_draws(`b_(sigma_)?x`, regex = T),
.id = "model") %>%
mutate(model = fct_inorder(factor(model)))
draws <- draws %>% group_by(model, .variable) %>%
mean_hdci(.width = c(.95, .99)) %>%
ungroup()
ggplot(draws, aes(y = .variable, x = .value, xmin = .lower, xmax = .upper)) +
geom_pointinterval(position = position_dodge(width = .4)) +
ggrepel::geom_text_repel(aes(label = if_else(.width == .95, sprintf("%.2f", .value), NA_character_)), nudge_y = .1) +
geom_vline(aes(xintercept = true_value), linetype = 'dashed',
data = tibble(model = c("m_between", "m_mixed", "m_mixed_2part", "m_between", "m_mixed", "m_mixed_2part", "m_2part_between"), .variable = c("b_x", "b_x", "b_x", "b_sigma_x", "b_sigma_x", "b_sigma_x", "b_sigma_x"), true_value = c(b_mean, b_mean, b_mean, b_sd_bs, b_sd_ws, b_sd_ws, b_sd_bs))) + scale_color_discrete(breaks = rev(levels(draws$model))) +
facet_grid(model ~ .variable) +
theme_bw() +
theme(legend.position = c(0.99,0.99),
legend.justification = c(1,1))
ggplot(betweenhdis, aes(x, .value,
ymin = .lower, ymax = .upper)) +
geom_pointrange() +
geom_line() +
ylab("sd(id)") +
scale_y_continuous(trans = "log", breaks = c(0.1, 0.25, 0.5, 1, 2, 4)) +
theme_bw()
Here, mu(y) is a function of x, as is sigma(y) at the between-subject level.
fit_models(
b_mean = 1, b_sd_bs = 0.8, b_sd_ws = 0, y_ceiling = Inf)
people <- people %>%
mutate(
mean_log_sd_y = -1 + b_sd_bs * x,
log_sd_y = 0 + b_sd_ws * x,
mean_y = rnorm(N, sd = exp(mean_log_sd_y)) + b_mean * x,
xQ = ntile(x, 6)
)
days <- people %>%
full_join(tibble(
id = rep(1:N, times = n_days_per_person)
), by = "id", multiple = "all") %>%
mutate(
latent_y = rnorm(n(),
mean = mean_y,
sd = exp(log_sd_y)),
y = case_when(
latent_y >= y_ceiling ~ y_ceiling,
# latent_y <= -1.5 ~ -1.5,
TRUE ~ latent_y
),
ycens = case_when(
latent_y >= y_ceiling ~ "right",
# latent_y <= -1.5 ~ "left",
TRUE ~ "none"
)
)
Percentage censored: 0.00
ggplot(days, aes(x, y)) +
geom_point(alpha = 0.3)
sel_ids <- c(43, 36, 8, 40, 88, 29, 11, 49, 84, 41, 98)
ggplot(days, aes(x, y)) +
geom_smooth(method = 'lm', color = "gray", se = F) +
geom_pointrange(stat = "summary",
fun = mean,
fun.min = function(x) { mean(x)-sd(x) },
fun.max = function(x) { mean(x)+sd(x) }, data = days %>% filter(id %in% sel_ids) ) +
geom_point(alpha = 0.3, data = days %>% filter(id %in% sel_ids) )
library(cowplot)
p1 <- days %>%
group_by(id) %>%
summarise(x = mean(x, na.rm = T),
mean_y = mean(y, na.rm = T)) %>%
ggplot(., aes(x, mean_y)) +
geom_smooth(method = 'lm', color = "gray", se = F) +
geom_point(alpha = 0.4)
p2 <- days %>%
group_by(id) %>%
summarise(x = mean(x, na.rm = T),
sd_y = sd(y, na.rm = T)) %>%
ggplot(., aes(x, sd_y)) +
geom_smooth(method = 'lm', color = "gray", se = F) +
geom_point(alpha = 0.4) +
scale_y_continuous(trans = "log", breaks = c(0.1, 0.25, 0.5, 1, 2, 4))
p3 <- people %>%
ggplot(., aes(x, mean_y)) +
geom_smooth(method = 'lm', color = "gray", se = F) +
geom_point(alpha = 0.4) +
ylab("latent mean y")
plot_grid(p1,p2,p3,ncol = 2)
m_mixed <- brm(bf(y | cens(ycens) ~ x + (1|id),
sigma ~ x), data = days, silent = 2, refresh = 0)
m_between <- brm(bf(y | cens(ycens) ~ x,
sigma ~ x), data = days %>% group_by(x, id) %>%
summarise(y = mean(y)) %>%
mutate(ycens = case_when(
y >= y_ceiling ~ "right",
TRUE ~ "none"
)), silent = 2, refresh = 0)
m_mixed_2part <- brm(bf(y | cens(ycens) ~ x + (1|gr(id, by = xQ)),
sigma ~ x), data = days, silent = 2, refresh = 0)
draws <- m_mixed_2part %>% gather_draws(`sd_.*x.*`, regex = T)
betweenhdis <- draws %>% mean_hdci(.width = .95) %>%
mutate(xQ = as.numeric(str_match(.variable, "xQ(\\d)")[,2])) %>%
left_join(people %>% group_by(xQ) %>% summarise(x = mean(x)))
m_2part_level2 <- brm(log(.value) | se(se, sigma = FALSE) ~ x, data = betweenhdis %>% mutate(se = (log(.value)-log(.lower))/2))
draws <- bind_rows(
m_mixed = m_mixed %>% gather_draws(`b_(sigma_)?x`, regex = T),
m_mixed_2part = m_mixed_2part %>% gather_draws(`b_(sigma_)?x`, regex = T),
m_2part_between = m_2part_level2 %>% gather_draws(`b_(sigma_)?x`, regex = T) %>% mutate(.variable = "b_sigma_x"),
m_between = m_between %>% gather_draws(`b_(sigma_)?x`, regex = T),
.id = "model") %>%
mutate(model = fct_inorder(factor(model)))
draws <- draws %>% group_by(model, .variable) %>%
mean_hdci(.width = c(.95, .99)) %>%
ungroup()
ggplot(draws, aes(y = .variable, x = .value, xmin = .lower, xmax = .upper)) +
geom_pointinterval(position = position_dodge(width = .4)) +
ggrepel::geom_text_repel(aes(label = if_else(.width == .95, sprintf("%.2f", .value), NA_character_)), nudge_y = .1) +
geom_vline(aes(xintercept = true_value), linetype = 'dashed',
data = tibble(model = c("m_between", "m_mixed", "m_mixed_2part", "m_between", "m_mixed", "m_mixed_2part", "m_2part_between"), .variable = c("b_x", "b_x", "b_x", "b_sigma_x", "b_sigma_x", "b_sigma_x", "b_sigma_x"), true_value = c(b_mean, b_mean, b_mean, b_sd_bs, b_sd_ws, b_sd_ws, b_sd_bs))) + scale_color_discrete(breaks = rev(levels(draws$model))) +
facet_grid(model ~ .variable) +
theme_bw() +
theme(legend.position = c(0.99,0.99),
legend.justification = c(1,1))
ggplot(betweenhdis, aes(x, .value,
ymin = .lower, ymax = .upper)) +
geom_pointrange() +
geom_line() +
ylab("sd(id)") +
scale_y_continuous(trans = "log", breaks = c(0.1, 0.25, 0.5, 1, 2, 4)) +
theme_bw()
# remotes::install_github("stephensrmmartin/LMMELSM")
# library(LMMELSM)
# m_melsm_2part <- lmmelsm(list(observed ~ y,
# location ~ x,
# scale ~ x,
# between ~ x),
# id, days)
# summary(m_melsm_2part)
Here, mu(y) is a function of x, as is sigma(y) (only at the within-subject level).
fit_models(b_mean = 1, b_sd_bs = 0, b_sd_ws = 0.8, y_ceiling = 2)
people <- people %>%
mutate(
mean_log_sd_y = -1 + b_sd_bs * x,
log_sd_y = 0 + b_sd_ws * x,
mean_y = rnorm(N, sd = exp(mean_log_sd_y)) + b_mean * x,
xQ = ntile(x, 6)
)
days <- people %>%
full_join(tibble(
id = rep(1:N, times = n_days_per_person)
), by = "id", multiple = "all") %>%
mutate(
latent_y = rnorm(n(),
mean = mean_y,
sd = exp(log_sd_y)),
y = case_when(
latent_y >= y_ceiling ~ y_ceiling,
# latent_y <= -1.5 ~ -1.5,
TRUE ~ latent_y
),
ycens = case_when(
latent_y >= y_ceiling ~ "right",
# latent_y <= -1.5 ~ "left",
TRUE ~ "none"
)
)
Percentage censored: 0.12
ggplot(days, aes(x, y)) +
geom_point(alpha = 0.3)
sel_ids <- c(43, 36, 8, 40, 88, 29, 11, 49, 84, 41, 98)
ggplot(days, aes(x, y)) +
geom_smooth(method = 'lm', color = "gray", se = F) +
geom_pointrange(stat = "summary",
fun = mean,
fun.min = function(x) { mean(x)-sd(x) },
fun.max = function(x) { mean(x)+sd(x) }, data = days %>% filter(id %in% sel_ids) ) +
geom_point(alpha = 0.3, data = days %>% filter(id %in% sel_ids) )
library(cowplot)
p1 <- days %>%
group_by(id) %>%
summarise(x = mean(x, na.rm = T),
mean_y = mean(y, na.rm = T)) %>%
ggplot(., aes(x, mean_y)) +
geom_smooth(method = 'lm', color = "gray", se = F) +
geom_point(alpha = 0.4)
p2 <- days %>%
group_by(id) %>%
summarise(x = mean(x, na.rm = T),
sd_y = sd(y, na.rm = T)) %>%
ggplot(., aes(x, sd_y)) +
geom_smooth(method = 'lm', color = "gray", se = F) +
geom_point(alpha = 0.4) +
scale_y_continuous(trans = "log", breaks = c(0.1, 0.25, 0.5, 1, 2, 4))
p3 <- people %>%
ggplot(., aes(x, mean_y)) +
geom_smooth(method = 'lm', color = "gray", se = F) +
geom_point(alpha = 0.4) +
ylab("latent mean y")
plot_grid(p1,p2,p3,ncol = 2)
m_mixed <- brm(bf(y | cens(ycens) ~ x + (1|id),
sigma ~ x), data = days, silent = 2, refresh = 0)
m_between <- brm(bf(y | cens(ycens) ~ x,
sigma ~ x), data = days %>% group_by(x, id) %>%
summarise(y = mean(y)) %>%
mutate(ycens = case_when(
y >= y_ceiling ~ "right",
TRUE ~ "none"
)), silent = 2, refresh = 0)
m_mixed_2part <- brm(bf(y | cens(ycens) ~ x + (1|gr(id, by = xQ)),
sigma ~ x), data = days, silent = 2, refresh = 0)
draws <- m_mixed_2part %>% gather_draws(`sd_.*x.*`, regex = T)
betweenhdis <- draws %>% mean_hdci(.width = .95) %>%
mutate(xQ = as.numeric(str_match(.variable, "xQ(\\d)")[,2])) %>%
left_join(people %>% group_by(xQ) %>% summarise(x = mean(x)))
m_2part_level2 <- brm(log(.value) | se(se, sigma = FALSE) ~ x, data = betweenhdis %>% mutate(se = (log(.value)-log(.lower))/2))
draws <- bind_rows(
m_mixed = m_mixed %>% gather_draws(`b_(sigma_)?x`, regex = T),
m_mixed_2part = m_mixed_2part %>% gather_draws(`b_(sigma_)?x`, regex = T),
m_2part_between = m_2part_level2 %>% gather_draws(`b_(sigma_)?x`, regex = T) %>% mutate(.variable = "b_sigma_x"),
m_between = m_between %>% gather_draws(`b_(sigma_)?x`, regex = T),
.id = "model") %>%
mutate(model = fct_inorder(factor(model)))
draws <- draws %>% group_by(model, .variable) %>%
mean_hdci(.width = c(.95, .99)) %>%
ungroup()
ggplot(draws, aes(y = .variable, x = .value, xmin = .lower, xmax = .upper)) +
geom_pointinterval(position = position_dodge(width = .4)) +
ggrepel::geom_text_repel(aes(label = if_else(.width == .95, sprintf("%.2f", .value), NA_character_)), nudge_y = .1) +
geom_vline(aes(xintercept = true_value), linetype = 'dashed',
data = tibble(model = c("m_between", "m_mixed", "m_mixed_2part", "m_between", "m_mixed", "m_mixed_2part", "m_2part_between"), .variable = c("b_x", "b_x", "b_x", "b_sigma_x", "b_sigma_x", "b_sigma_x", "b_sigma_x"), true_value = c(b_mean, b_mean, b_mean, b_sd_bs, b_sd_ws, b_sd_ws, b_sd_bs))) + scale_color_discrete(breaks = rev(levels(draws$model))) +
facet_grid(model ~ .variable) +
theme_bw() +
theme(legend.position = c(0.99,0.99),
legend.justification = c(1,1))
ggplot(betweenhdis, aes(x, .value,
ymin = .lower, ymax = .upper)) +
geom_pointrange() +
geom_line() +
ylab("sd(id)") +
scale_y_continuous(trans = "log", breaks = c(0.1, 0.25, 0.5, 1, 2, 4)) +
theme_bw()
Here, mu(y) is a function of x, as is sigma(y) only at the between-subject level.
fit_models(b_mean = 1, b_sd_bs = 0.8, b_sd_ws = 0, y_ceiling = 2)
people <- people %>%
mutate(
mean_log_sd_y = -1 + b_sd_bs * x,
log_sd_y = 0 + b_sd_ws * x,
mean_y = rnorm(N, sd = exp(mean_log_sd_y)) + b_mean * x,
xQ = ntile(x, 6)
)
days <- people %>%
full_join(tibble(
id = rep(1:N, times = n_days_per_person)
), by = "id", multiple = "all") %>%
mutate(
latent_y = rnorm(n(),
mean = mean_y,
sd = exp(log_sd_y)),
y = case_when(
latent_y >= y_ceiling ~ y_ceiling,
# latent_y <= -1.5 ~ -1.5,
TRUE ~ latent_y
),
ycens = case_when(
latent_y >= y_ceiling ~ "right",
# latent_y <= -1.5 ~ "left",
TRUE ~ "none"
)
)
Percentage censored: 0.10
ggplot(days, aes(x, y)) +
geom_point(alpha = 0.3)
sel_ids <- c(43, 36, 8, 40, 88, 29, 11, 49, 84, 41, 98)
ggplot(days, aes(x, y)) +
geom_smooth(method = 'lm', color = "gray", se = F) +
geom_pointrange(stat = "summary",
fun = mean,
fun.min = function(x) { mean(x)-sd(x) },
fun.max = function(x) { mean(x)+sd(x) }, data = days %>% filter(id %in% sel_ids) ) +
geom_point(alpha = 0.3, data = days %>% filter(id %in% sel_ids) )
library(cowplot)
p1 <- days %>%
group_by(id) %>%
summarise(x = mean(x, na.rm = T),
mean_y = mean(y, na.rm = T)) %>%
ggplot(., aes(x, mean_y)) +
geom_smooth(method = 'lm', color = "gray", se = F) +
geom_point(alpha = 0.4)
p2 <- days %>%
group_by(id) %>%
summarise(x = mean(x, na.rm = T),
sd_y = sd(y, na.rm = T)) %>%
ggplot(., aes(x, sd_y)) +
geom_smooth(method = 'lm', color = "gray", se = F) +
geom_point(alpha = 0.4) +
scale_y_continuous(trans = "log", breaks = c(0.1, 0.25, 0.5, 1, 2, 4))
p3 <- people %>%
ggplot(., aes(x, mean_y)) +
geom_smooth(method = 'lm', color = "gray", se = F) +
geom_point(alpha = 0.4) +
ylab("latent mean y")
plot_grid(p1,p2,p3,ncol = 2)
m_mixed <- brm(bf(y | cens(ycens) ~ x + (1|id),
sigma ~ x), data = days, silent = 2, refresh = 0)
m_between <- brm(bf(y | cens(ycens) ~ x,
sigma ~ x), data = days %>% group_by(x, id) %>%
summarise(y = mean(y)) %>%
mutate(ycens = case_when(
y >= y_ceiling ~ "right",
TRUE ~ "none"
)), silent = 2, refresh = 0)
m_mixed_2part <- brm(bf(y | cens(ycens) ~ x + (1|gr(id, by = xQ)),
sigma ~ x), data = days, silent = 2, refresh = 0)
draws <- m_mixed_2part %>% gather_draws(`sd_.*x.*`, regex = T)
betweenhdis <- draws %>% mean_hdci(.width = .95) %>%
mutate(xQ = as.numeric(str_match(.variable, "xQ(\\d)")[,2])) %>%
left_join(people %>% group_by(xQ) %>% summarise(x = mean(x)))
m_2part_level2 <- brm(log(.value) | se(se, sigma = FALSE) ~ x, data = betweenhdis %>% mutate(se = (log(.value)-log(.lower))/2))
draws <- bind_rows(
m_mixed = m_mixed %>% gather_draws(`b_(sigma_)?x`, regex = T),
m_mixed_2part = m_mixed_2part %>% gather_draws(`b_(sigma_)?x`, regex = T),
m_2part_between = m_2part_level2 %>% gather_draws(`b_(sigma_)?x`, regex = T) %>% mutate(.variable = "b_sigma_x"),
m_between = m_between %>% gather_draws(`b_(sigma_)?x`, regex = T),
.id = "model") %>%
mutate(model = fct_inorder(factor(model)))
draws <- draws %>% group_by(model, .variable) %>%
mean_hdci(.width = c(.95, .99)) %>%
ungroup()
ggplot(draws, aes(y = .variable, x = .value, xmin = .lower, xmax = .upper)) +
geom_pointinterval(position = position_dodge(width = .4)) +
ggrepel::geom_text_repel(aes(label = if_else(.width == .95, sprintf("%.2f", .value), NA_character_)), nudge_y = .1) +
geom_vline(aes(xintercept = true_value), linetype = 'dashed',
data = tibble(model = c("m_between", "m_mixed", "m_mixed_2part", "m_between", "m_mixed", "m_mixed_2part", "m_2part_between"), .variable = c("b_x", "b_x", "b_x", "b_sigma_x", "b_sigma_x", "b_sigma_x", "b_sigma_x"), true_value = c(b_mean, b_mean, b_mean, b_sd_bs, b_sd_ws, b_sd_ws, b_sd_bs))) + scale_color_discrete(breaks = rev(levels(draws$model))) +
facet_grid(model ~ .variable) +
theme_bw() +
theme(legend.position = c(0.99,0.99),
legend.justification = c(1,1))
ggplot(betweenhdis, aes(x, .value,
ymin = .lower, ymax = .upper)) +
geom_pointrange() +
geom_line() +
ylab("sd(id)") +
scale_y_continuous(trans = "log", breaks = c(0.1, 0.25, 0.5, 1, 2, 4)) +
theme_bw()
so, a doubling of income is associated with same happiness increment throughout the income range↩︎
Big if!↩︎
Even bigger if!↩︎
two assumptions that I’ll touch on here. They do not make their assumptions regarding causal inference explicit, but still very much imply causality. But, like, what else is new?↩︎
I requested the raw data on March 4 and received an answer that ping-level data could not be shared owing to promises made to participants on March 15.↩︎
My dictionary says “disimprovement”. Is that actually in use?↩︎
Am I projecting based on my own attempt at adversarial collaboration? Absolutely! Thanks for asking, I’m okay!↩︎
If you see mistakes or want to suggest changes, please create an issue on the source repository.
Text and figures are licensed under Creative Commons Attribution CC BY 4.0. Source code is available at https://github.com/rubenarslan/rubenarslan.github.io, unless otherwise noted. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".
For attribution, please cite this work as
Arslan (2023, March 13). One lives only to make blunders: Multilevel censored location-scale models. Retrieved from https://rubenarslan.github.io/posts/2023-03-05-multilevel-censored-location-scale-models/
BibTeX citation
@misc{arslan2023multilevel, author = {Arslan, Ruben C.}, title = {One lives only to make blunders: Multilevel censored location-scale models}, url = {https://rubenarslan.github.io/posts/2023-03-05-multilevel-censored-location-scale-models/}, year = {2023} }