library(tidyverse)
library(ggdist)
library(here)
source(here("_defaults.R"))
Loading
Simulating a Galton Board
“Suppose you and a thousand of your closest friends line up in the halfway line of a soccer field.”
Ok, so the N is 1+1,000 (“you and 1000 of your closest friends”). Apparently a soccer field is 360 feet long, and an average stride length is something like 2.3 feet.
360/2)/2.3 (
[1] 78.26087
We can get in 78 steps from the halfway line to the end of the field.
set.seed(500)
expand_grid(
person = 1:1001,
step = 1:78
|>
) mutate(
flip = sample(
c(-1, 1),
size = n(),
replace = T
)|>
) mutate(
.by = person,
position = cumsum(flip)
->
) galton_board
|>
galton_board mutate(
.by = c(step, position),
n = n()
->
) galton_board
|>
galton_board ggplot(
aes(step, position)
+
)geom_line(
aes(group = person, color = n)
+
) scale_x_reverse()+
::scale_color_bilbao(
khromaguide = "none"
+
)coord_flip()
It’s hard to visualize well with the completely overlapping points. I’ll plot histograms for very 10th step.
|>
galton_board filter(step %in% seq(10, 70, by = 10)) |>
ggplot(aes(position, factor(step)))+
stat_histinterval(
breaks = breaks_fixed(width = 2),
aes(fill = after_stat(pdf))
+
)::scale_fill_bilbao(
khromaguide = "none"
+
)scale_y_discrete(
limits = factor(seq(70, 10, by = -10))
)
Infinitesimal Galton Board
Same as before, but now instead of flipping a coin for -1 and 1, values are sampled from
expand_grid(
person = 1:1001,
step = 1:78
|>
) mutate(
flip = runif(
n(),
-1,
1
)|>
) mutate(
.by = person,
position = cumsum(flip)
->
) inf_galton_board
|>
inf_galton_board ggplot(aes(step, position))+
geom_line(
aes(group = person),
alpha = 0.05
+
)scale_x_reverse()+
coord_flip()
|>
inf_galton_board filter(step %in% seq(10, 70, by = 10)) |>
ggplot(aes(position, factor(step)))+
stat_slabinterval(
aes(fill = after_stat(pdf)),
fill_type = "gradient"
+
)::scale_fill_bilbao(
khromaguide = "none"
+
)scale_y_discrete(
limits = factor(seq(70, 10, by = -10))
)
Nice.
I’m not 100% sure how to get a normal density estimate superimposed in that same plot. So I’ll fake it instead.
|>
inf_galton_board filter(step %in% seq(10, 70, by = 10)) ->
ten_steps
|>
ten_steps summarise(
.by = step,
mean = mean(position),
sd = sd(position)
|>
) nest(
.by = step
|>
) mutate(
dist = map(
data,~tibble(
position = seq(-20, 20, length = 100),
dens = dnorm(
position, mean = .x$mean,
sd = .x$sd
)
)
)|>
) unnest(dist) |>
mutate(
dens_norm = dens/max(dens)
->
) distributions
- 1
- Calculating the distribution parameters for each step grouping.
- 2
- Mapping over the distribution parameters to get density values in a tibble.
- 3
-
For plotting over the
stat_slab()
output, normalizing the density to max out at 1.
|>
ten_steps ggplot(aes(position))+
stat_slabinterval(
aes(fill = after_stat(pdf)),
fill_type = "gradient"
+
)geom_line(
data = distributions,
aes(y = dens_norm)
+
)::scale_fill_bilbao(
khromaguide = "none"
+
)facet_wrap(
~step, labeller = label_both
+
)theme_no_y()
Comparing parameters
For my own interest, I wonder how much discrete sampling from -1, 1 vs the uniform distribution affects the
|>
galton_board summarise(
.by = step,
pos_sd = sd(position)
|>
) mutate(
sampling = "discrete"
->
)
galton_sd
|>
inf_galton_board summarise(
.by = step,
pos_sd = sd(position)
|>
) mutate(
sampling = "uniform"
->
) inf_galton_sd
bind_rows(
galton_sd,
inf_galton_sd|>
) ggplot(aes(step, pos_sd))+
geom_line(
aes(color = sampling),
linewidth = 1
+
)expand_limits(y = 0)+
labs(
y = expression(sigma)
)
Messing around with a few obvious values of
Model Diagrams
Here’s the model described in the text.
He also defines a sampling distribution over
It’s ok. No quite a Kruschke diagram.
Another example.
Let me try to write out the diagram for something like y ~ x + (1|z)
.
Geeze, idk. That double subscript feels rough, and I don’t know the convention for describing the random effects.
Yeah, this is too tall. Will have to think about this. The Krushke style diagram is the most compressed version imo.
Footnotes
Not literally
though, cause that’s too small at 1.156↩︎