For me, teaching stats this semester has turned into a journey of discovering what the distributional and ggdist packages can do for me. The way I make illustrative figures will never be the same. So I thought I’d revisit my post about hierarchical variance priors, this time implementing the figures using these two packages.
Custom y theme and scale
theme_no_y <- function(){
out_theme <- theme(
axis.text.y = element_blank(),
axis.ticks.y = element_blank(),
axis.title.y = element_blank()
)
out_theme
}
scale_y_tight <- function(...) {
scale_y_continuous(expand = expansion(0), ...)
}
Random effect variance priors
When fitting a model with a random intercept, the group-level random effects (let’s say, \(\gamma_i\)) are sampled from a normal distribution
\[ \gamma_i \sim \mathcal{N}(0,\sigma) \]
If you look at the default prior brms uses for \(\sigma\), it’s a truncated student-t:
student_t(3, 0, 2.5)
We can make this distribution!
full_t <- dist_student_t(
df = 3,
mu = 0,
sigma = 2.5
)
half_t <- dist_truncated(
full_t,
lower = 0
)
We can get things like the mean and variance of this truncated distribution, and generate random samples from it
mean(half_t)
[1] 2.756644
variance(half_t)
[1] 11.15091
generate(half_t, 10)
[[1]]
[1] 5.1531979 1.2497857 1.0370256 3.0964036 4.6862006 1.1580219 0.2008173
[8] 8.3719751 2.6385236 0.2936411
And we can plot it
ggplot()+
stat_slab(
aes(
xdist = half_t
),
fill = "#EE6677"
)+
scale_y_tight()+
theme_no_y()+
scale_thickness_shared()
Let’s just use the expected value of this distribution as \(\sigma\) for now.
init_sigma <- mean(half_t)
Random effects in probability space
Now let’s make the normal distribution for the group-level random effects.
ranef_dist <- dist_normal(
mu = 0,
sigma = init_sigma
)
Ok, great! But what if the model I’m fitting is a logistic regression? This random effects distribution is in the logit space. But what does it look like if we transform it to the probability space?
ranef_prob_dist <- dist_transformed(
ranef_dist,
plogis,
qlogis
)
ggplot()+
stat_slab(
aes(
xdist = ranef_prob_dist
),
fill = "#EE6677"
)+
scale_y_tight()+
theme_no_y()
Rather than having a random effects distribution where groups are broadly distributed across the probability space, we actually have a random effects distribution where groups are pretty strongly bifurcated. And this is at the expected value for our prior over \(\sigma\).
Looking for a more neutral distribution
Let’s see what different \(\sigma\)s look like in the probability space.
possible_dists <- tibble(
sigma = seq(
1, 2.1, by = 0.1
),
dist = dist_normal(0, sigma),
p_dist = dist_transformed(
dist, plogis, qlogis
)
)
ggplot(
possible_dists,
aes(
xdist = p_dist,
fill = sigma
)
)+
stat_slab(
color = "black",
linewidth = 0.5
)+
scale_fill_scico(
palette = "devon",
guide = "none"
)+
scale_y_tight()+
facet_wrap(~sigma)+
theme_no_y()+
theme_no_x()
It looks like somewhere between 1.3 and 1.4 is the sweet spot for a maximally flat random effects distribution in the probability space.
Really honing in on it
I can try getting even more precise by looking at a vectorized version of these distributions, and finding the largest sigma what still has its density peak at 0.5.
library(purrr)
# a vector of sigmas
sigmas = seq(1.3, 1.5, length = 100)
# a vectorized normal
vec_dist <- dist_normal(
mu = 0,
sigma = sigmas
)
# a vectorized ilogit(normal)
vec_p_dist <- dist_transformed(
vec_dist,
plogis,
qlogis
)
# the density function
# from 0 to 0.5
p_densities <- density(
vec_p_dist,
seq(0, 0.5, length = 100)
)
# The index of the max
# density
where_is_max <- p_densities |>
map_vec(
which.max
)
# if where_is_max == 100
# peak density was at 0.5
flat_idx <- (where_is_max == 100) |>
which() |>
max()
flattest_sigma <- sigmas[flat_idx]
flattest_sigma
[1] 1.413131
Let’s take a look at it:
flat_pdist <- dist_normal(0, flattest_sigma) |>
dist_transformed(plogis,qlogis)
ggplot()+
stat_slab(
aes(
xdist = flat_pdist
),
fill = "#EE6677"
)+
scale_y_tight()+
theme_no_y()
What about a prior?
In a logistic regression I think I would usually like a prior over \(\sigma\) for random effects to have its expected value right about here, with about as close as we can get to a uniform prior in probability space. Then, the data can pull it towards being more bifurcated, or more focused, depending.
I’m not sure how to solve that, but I can do a grid search!
tibble(
prior_sigma = seq(1,1.5,length = 100),
half_student = dist_student_t(3,0,prior_sigma) |>
dist_truncated(lower = 0),
expected = mean(half_student)
) |>
slice(
which.min(abs(expected - flattest_sigma))
)
# A tibble: 1 × 3
prior_sigma half_student expected
<dbl> <dist> <dbl>
1 1.28 t(3, 0, 1.3)[0,Inf] 1.41
Let’s call it 1.28.
ranef_sigma_prior <- dist_student_t(3, 0, 1.28) |>
dist_truncated(lower = 0)
set.seed(2024)
ranef_sigmas <- generate(ranef_sigma_prior, 9)[[1]]
tibble(
sigma = ranef_sigmas,
rounded_sigma = round(sigma, digits = 2),
dist = dist_normal(0, sigma),
p_dist = dist_transformed(
dist, plogis, qlogis
)
) |>
ggplot(
aes(
xdist = p_dist
)
)+
stat_slab(
aes(
fill = sigma
),
normalize = "panels",
color = "black",
linewidth = 0.5
)+
scale_y_tight()+
scale_fill_scico(
palette = "devon",
guide = "none"
)+
scale_x_continuous(
breaks = c(0,1)
)+
facet_wrap(~rounded_sigma)+
theme_no_y()
Looks good to me!
Reuse
Citation
@online{fruehwald2024,
author = {Fruehwald, Josef},
title = {Random Effect Priors, Redo},
series = {Væl Space},
date = {2024-11-19},
url = {https://jofrhwld.github.io/blog/posts/2024/11/2024-11-19_logit-priors-again/},
langid = {en}
}