Thinking through my approach for modelling false positive button clicks in an app interface.
If you’ve seen me talk at a conference in recent years, you know that I have been working with data collected by the cycle tracking app Clue.
The cool thing about this dataset is that it’s a) big b) highly repeated (daily tracking) c) spans long time frames (for some users, years).
The not so cool thing is that all outcomes are tracked using single on-off button for each outcome. No multiple items, no Likert scales, not even a way to tell a nonresponse from a no. Still, the amount of data, in my view, can more than compensate for this problem, and I find very similar results on e.g. sexual desire as in my other work, which hopefully does better psychometrically but falls way short of Clue’s sample size.
Another problem that I have been grappling with is that there seems to be a nonzero rate of misclicking in the app. Unfortunately, the Clue database does not record when a button is toggled off again, so even if users correct their misclick, it ends up in my data.
The substantive question I’m trying to answer is to estimate the rate of mittelschmerz, i.e. how many people ever experience mittelschmerz and/or in how many cycles mittelschmerz is experienced. Mittelschmerz is a pain that occurs around ovulation, i.e. in the middle of the cycle. Mittelschmerz should not occur during (pre-)menstruation. Nor should people who take combined oral contraceptive pills be ovulating especially frequently. Yet, I observe both of these things. Why?
I suspect the interface. In Clue, symptom buttons are thematically arranged. As you can see, the “ovulation pain” button is next to the button for cramps and tender breasts. Guess which forms of pain do occur during (pre-)menstruation and for people on the pill. That’s right, cramps and breast tenderness.
So, I think that in some non-zero percent of cases, users will accidentally turn on the “ovulation pain” button instead of or along with the “cramps” and “tender breasts” button.
Here, let me simulate that real quick:
set.seed(05102019)
library(tidyverse)
reps = 10000
days <- 30
logit <- boot::logit
miss_prob <- 0
daily <- tibble(
# users
id = rep(1:reps, each = 30),
# cycle days
time = rep(1:days, times = reps),
ovulating = c(rep(1, times = 30 * 0.75 * reps),
rep(0, times = 30 * 0.25 * reps)),
feel_ms = c(rep(1, times = 30 * 0.25 * reps),
rep(0, times = 30 * 0.75 * reps)),
ovulation_day = rep(round(rnorm(reps, 15, sd = 1.5)), each = 30),
cs = 1 * rbinom(reps * days, 1,
plogis(-5 + 1 * if_else(time < 6, 5-time, 0))),
os = if_else(time == ovulation_day & ovulating & feel_ms, 1, 0),
c_clicks = rbinom(reps * days, 1, prob = (1 - miss_prob) * cs),
o_clicks = rbinom(reps * days, 1, prob =
plogis(-4 + 3 * cs + 10 * os))) %>%
mutate(time = factor(time))
# readr::write_csv(daily, "daily.csv")
# daily %>% group_by(time) %>% summarise_all(mean) %>% View
# daily %>% group_by(time) %>% summarise(sprintf("%.7f",mean(o_clicks))) %>% View
As you can see, I’ve simulated an effect of time on the variable cs
For a subset of ids, there is one ovulation per cycle, sometime
in the middle, which causes os. The true rate is 1/4 * 1/30
= 0.00833, i.e. a fourth of people feel ovulation on one day per cycle.
library(marginaleffects)
glm1 <- glm(os ~ 1, data = daily, family = gaussian)
predictions(glm1)[1,]
Estimate Pr(>|z|) S 2.5 % 97.5 %
0.00833 <0.001 Inf 0.00801 0.00866
Columns: rowid, estimate, p.value, s.value, conf.low, conf.high, os
Type: invlink(link)
In the next step, I’ve modelled clicks on these buttons. People hit their button of
choice with a 95% accuracy. But there’s a 20% chance of fat thumbs hitting the o
button when aiming for b
or c
.
So, while o truly peaks in the middle, we actually see additional peaks at either end of the cycle in the observed clicks.
ggplot(daily %>% pivot_longer(c(cs:os,c_clicks, o_clicks, -time)) %>%
mutate(clicks = if_else(str_detect(name, "clicks"),
"observed", "true"),
name = str_sub(name, 1, 1)), aes(time)) +
geom_point(aes(y = value, color = name), stat = 'summary') +
facet_wrap(~ clicks) +
scale_x_discrete(breaks = c(1, 5, 10, 15, 20, 25, 30)) +
theme_bw()
Now, how do I get rid of this bias? Here’s an idea: I find a ~pure false positive subset to estimate the relationship of false positives to my predictors. In my real data, these would be cycle days during (pre-)menstruation for users of the combined oral contraceptive pill. In my simulations, I subset in a similar way and then fit a general linear model.
ggplot(daily %>% filter(ovulating == 0) %>%
pivot_longer(c(cs:os,c_clicks, o_clicks, -time)) %>%
mutate(clicks = if_else(str_detect(name, "clicks"),
"observed", "true"),
name = str_sub(name, 1, 1)), aes(time)) +
geom_point(aes(y = value, color = name), stat = 'summary') +
facet_wrap(~ clicks) +
scale_x_discrete(breaks = c(1, 5, 10, 15, 20, 25, 30)) +
theme_bw()
Now, I can compute the rate before and after setting my predictors to zero.
avg_predictions(glm2)
Estimate Pr(>|z|) S 2.5 % 97.5 %
0.0204 <0.001 Inf 0.0192 0.0218
Columns: estimate, p.value, s.value, conf.low, conf.high
Type: invlink(link)
avg_predictions(glm2, newdata = tibble(c_clicks = 0))
Estimate Pr(>|z|) S 2.5 % 97.5 %
0.0188 <0.001 Inf 0.0176 0.0201
Columns: rowid, estimate, p.value, s.value, conf.low, conf.high, c_clicks, rowid_dedup
Type: invlink(link)
This is very close to plogis(-4)
~= 0.018, i.e. the false positive intercept I added in my simulation above.
I can estimate the rate of false positives per day in all of the data, including data where there should be true positives.
To estimate the true rate, I subtract the false positive rate and divide by (1-FP) to account for clicks which were both intended and accidental. Normally, two clicks on the same button would turn the button off again. However, in Clue, they unfortunately had a problem with their database, so off-toggles went unrecorded. This probably contributes to the fairly high false positive rate.
glm5 <- glm(o_clicks ~ 1, data = daily, family = binomial)
daily$obs_rate <- predict(glm5, type = "response")
daily %>%
summarise(obs_rate = mean(obs_rate), fp_rate = mean(fp)) %>%
mutate(
est_true_rate = (obs_rate - fp_rate) / (1 - fp_rate)
)
# A tibble: 1 × 3
obs_rate fp_rate est_true_rate
<dbl> <dbl> <dbl>
1 0.0318 0.0237 0.00823
Let’s bring the time variable back in. The black line is the true curve.
glm7 <- glm(o_clicks ~ time, data = daily, family = binomial)
daily$obs_rate <- predict(glm7, type = "response")
daily %>%
group_by(time) %>%
summarise(obs_rate = mean(obs_rate), fp_rate = mean(fp)) %>%
mutate(
est_true_rate = (obs_rate - fp_rate) / (1 - fp_rate)
) %>%
mutate(est_true_rate = if_else(est_true_rate < 0, 0, est_true_rate)
) %>%
pivot_longer(c(-time)) %>%
ggplot(aes(time, value, color = name)) +
geom_line(aes(time, os, group = 1),color = "black", stat = "summary", data = daily) +
geom_point(position = position_dodge(width = 0.4)) +
theme_bw() +
scale_color_brewer(type = 'qual', palette = 2)
In the real example, I’m interested in estimating how many users ever experience mittelschmerz. So, I also wanted to estimate false positives at the cycle and user level. For simplification, I just consider user/cycle as one thing here. The strategy remains the same: focus on the (pre-)menstruation period for users who are not ovulating to estimate the false positive rate. Extrapolate it to the rest of the users. Then compute the true rate from the observed rate.
Complications:
c_clicks
as a predictor, instead of the mean (so that it would scale with the observation period)no_o_users <- daily %>%
filter(!feel_ms, !time %in% 10:20) %>%
group_by(id, feel_ms) %>%
summarise(fp_daily = mean(fp),
any_o_clicks = if_else(any(o_clicks == 1), 1, 0),
days = n(),
o_clicks = sum(o_clicks),
c_clicks = sum(c_clicks))
users <- daily %>% group_by(id, feel_ms) %>%
summarise(fp_daily = mean(fp),
any_o_clicks = if_else(any(o_clicks == 1), 1, 0),
days = n(),
o_clicks = sum(o_clicks),
c_clicks = sum(c_clicks))
glm9 <- glm(any_o_clicks ~ 1 + c_clicks + offset(log(days)), data = no_o_users, family = binomial())
avg_predictions(glm9)
Estimate Pr(>|z|) S 2.5 % 97.5 %
0.386 <0.001 267.3 0.374 0.397
Columns: estimate, p.value, s.value, conf.low, conf.high
Type: invlink(link)
(fp_rate <- avg_predictions(glm9, newdata = tibble(days = 30, c_clicks = mean(users$c_clicks)))[,"estimate"])
[1] 0.5111248
users$fp_users <- predict(glm9, newdata = users, type = "response")
glm10 <- glm(any_o_clicks ~ 1 + offset(log(days)), data = users, family = binomial())
obs_rate <- avg_predictions(glm10, newdata = tibble(days = 30, c_clicks = mean(no_o_users$c_clicks)))[,"estimate"]
mean(users$fp_users,na.rm=T)
[1] 0.5078552
[1] 0.2611931
Looks good! Now I’ll try my approach on the real data, to be inevitably disappointed.
I first tried to use offsets in binomial regressions to fit my generative model. But I ran into trouble setting this up. Also, in both cases, I have to use log(fp)
as the offset which implies that setting fp=0
in my marginal effects yields -Inf
. This doesn’t seem right and I gave up on this. If you’re interested, the code follows.
I can then use the estimated rate of false positives as an offset (a regression coefficient whose slope has been fixed to 1).
knitr::opts_chunk$set(eval = F)
This is still higher than the true value of .008, but closer.
Let’s bring the time variable back in.
glm7 <- glm(o_clicks ~ time, data = daily, family = binomial)
glm8 <- glm(o_clicks ~ time + offset(log(fp)), data = daily, family = binomial)
bind_rows(
no_offset = predictions(glm7, newdata = tibble(time = factor(1:30))),
offset = predictions(glm8, newdata = tibble(time = factor(1:30),
fp = 0.01)),
.id = 'model') %>%
ggplot(aes(time, estimate, color = model)) +
geom_line(aes(time, os, group = 1),color = "black", stat = "summary", data = daily) +
geom_point(position = position_dodge(width = 0.4)) +
theme_bw() +
scale_color_brewer(type = 'qual')
After taking the offset into account, the peaks near the end disappear. I’m pretty satisfied with this.
In the real example, I’m interested in estimating how many users ever experience mittelschmerz. So, I a) want to implement a false positive model at the daily level b) estimate rates of any mittelschmerz per cycle and c) per user. Especially for c) I’d like to specify a model that takes the number of cycles into account (since I’m more likely to observe a cycle with mittelschmerz, the more cycles I have for the user). I have this all set up, but I’m struggling with implementing a false positive model using lme4. It sort of works with brms, but that’s a poor fit for the huge sample size I’m working with. Also, in both cases, I have to use log(fp)
as the offset which implies that setting fp=0
in my marginal effects yields -Inf
. This doesn’t seem right.
At this point, I’m wondering whether the approach via offset
is silly, but I don’t really have an alternative idea.
Using lme4
library(lme4)
# https://stats.stackexchange.com/questions/88960/lme4-glmer-problems-with-offset
#
glm9 <- glmer(o_clicks ~ time + (1|id), data = daily, family = binomial())
glm10 <- glmer(o_clicks ~ time + offset(log(fp)) + (1|id), data = daily,
family = binomial(link="cloglog")) # does not work
glm10 <- glmer(o_clicks ~ time + log(fp) + (1|id), data = daily,
family = binomial(link="cloglog")) # works and estimate the slope as 1, so why does the other model not work
bind_rows(
no_offset = predictions(glm9, newdata = tibble(time = factor(1:30)),
re.form = NA),
offset = predictions(glm10, newdata = tibble(time = factor(1:30),
fp = 0.01), re.form = NA),
.id = 'model') %>%
ggplot(aes(time, estimate, color = model)) +
geom_point(position = position_dodge(width = 0.4)) +
theme_bw() +
scale_color_brewer(type = 'qual')
If you see mistakes or want to suggest changes, please create an issue on the source repository.
Text and figures are licensed under Creative Commons Attribution CC BY 4.0. Source code is available at https://github.com/rubenarslan/rubenarslan.github.io, unless otherwise noted. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".
For attribution, please cite this work as
Arslan (2023, Sept. 2). One lives only to make blunders: Modeling misclicks in an app interface. Retrieved from https://rubenarslan.github.io/posts/2023-09-19-modeling-misclicks-in-an-app-interface/
BibTeX citation
@misc{arslan2023modeling, author = {Arslan, Ruben C.}, title = {One lives only to make blunders: Modeling misclicks in an app interface}, url = {https://rubenarslan.github.io/posts/2023-09-19-modeling-misclicks-in-an-app-interface/}, year = {2023} }