Just focusing on the intercept and newsworthiness effects, they went from (correctly) being unrelated in the model without the collider, to having pretty reliable effects by just including the selected variable.
Here’s the scary part. The illustration from the book is about inter-generational effects on education. Grandparents will have an effect on their children (the parent) and parents will have an effect on their children. The question is, is there any direct effect of grandparents on children.
graph LR
g[Grandparent]
p[Parent]
c[Child]
g --> p
g -.->|?| c
p --> c
finding the direct effect
dagify( p ~ g, c ~ p, c ~ g) |>adjustmentSets(exposure ="g", outcome ="c",effect ="direct" )
{ p }
Ok, but the spooky thing is what if there’s a variable (like, neighborhood) that’s shared by the parent and child, but not the grandparent, which we didn’t record.
graph LR
g[Grandparent]
p[Parent]
c[Child]
n[Neighborhood]
g -.->|?| c
g --> p
p --> c
n --> p
n --> c
style n stroke-dasharray: 5 5
Parent has apparently become a collider, but I’m still trying to noodle through why.
Ok, having stepped away for a bit, I think my problem was some confusion about how the “paths” work in DAGs.
I realized:
The connections from one node to another are directed.
But when charting a path from a variable to the outcome, you ignore the directedness.
Then, you add back in the directedness to diagnose confounder, mediator, collider etc.
So, ignoring the directedness, we have the following paths from Grandparent to Child.
Undirected Paths
Grandparent — Child
Grandparent — Parent — Child
Grandparent — Parent — Neighborhood — Child
Then, we can add in the directedness
Directed Paths
Grandparent → Child
Grandparent → Parent → Child
Grandparent → Parent ← Neighborhood → Child
Because of path 2, (Grandparent → Parent → Child), in order to get the “direct effect” of Grandparent, we need to include Parent. But because of path 3 (Grandparent → Parent ← Neighborhood → Child), Parent is also a Collider. If we don’t include Neighborhood in the model (maybe because we didn’t measure it!) the estimate for Grandparent is going to get all screwy!
I’m still developing my intuitions for why and how the estimate will get screwy.
Getting the paths
With {dagitty} and {ggdag} you’re supposed to be able to get the paths automatically, but I can’t get the {ggdag} one to work give me the collider path.
As far as things go, the model will make good predictions, because the statistical associations are correct, but the causal interpretation (“Grandparents have a negative effect”) is wrong.
posterior predictive check
pp_check(haunted1_mod)
Including the variable haunting the DAG ought to improve things, but in reality that assumes we know what it is, and have some measure of it.
Exorcised model
brm( child ~ parent + grandparent + neighborhood,data = haunted_sim,prior =c(prior(normal(0,3), class = b) ),backend ="cmdstanr",cores =4,file ="exorcised1")-> exorcised1_mod