Mathematical Derivation and Computational Simulation of the semi-parametric class of G-methods
Causal Inference is a field that touches several domains and is of interest to a wide range of practitioners including Statisticians, Data Scientists, Machine Learning Scientists, and other Computational Researchers. Recovery of unbiased estimates of Causal Effects is at times a tough task, even in randomized settings. This task can be particularly challenging in non-randomized settings, requiring an array of often empirically untestable assumptions to hold that have both mathematical and philosophical implications.
In the interest of recovering unbiased estimates of Mean Causal Effects of an Intervention-Outcome relationship, there are several tools we can leverage. Under the assumptions of correct specification of a Causal DAG or SWIG, and the measured variables in said Causal DAG being a sufficient set for conditional confounding adjustment of the Intervention-Outcome relationship of interest (and hence provides conditional exchangeability within levels of the variables in the sufficient set), we can leverage the set of G-Methods to recover unbiased estimates of Causal Contrasts of interest.
The class of G-Methods can be grouped into three categories:
- Standardization via the g-formula (parametric or non-parametric)
- Inverse Probability Weighting (IPW) via Marginal Structural Models (MSM)
- G-Estimation of Structural Nested Models (SNM)
In my previous piece on Doubly Robust Estimation techniques, we covered Standardization and IPW, and examples of how these two techniques can sometimes be combined into a single Doubly Robust sampling estimator.
In this piece, we cover the last category of G-Methods: G-Estimation of Structural Nested Models. Structural Nested Models are a class of semi-parametric models, and generally require fewer parametric assumptions than the g-formula or Marginal Structural Models (and therefore are less prone to parametric misspecification).
The contents of this piece are as follows:
I would like to preface that this article is quite mathematically involved. I truly believe G-Estimation of Structural Nested Models is a valuable and impactful method, and should be in the toolbox of any practitioner who works in the Causal Inference space. However, the derivations and justifications for this technique can be confusing and are not for the feint-of-heart. So please don’t discourage if a first read through this piece leaves you feeling lost. It took me quite a bit of time to fully grasp G-estimation when learning this material in graduate school. But I guarantee you, if you take the time to understand the approach, it will have been well worth the effort.
With that said, let’s jump in.
Both Standardization and IPW can be leveraged to recover unbiased estimates of Mean Causal Effects of versions of an Intervention-Outcome relationship; with these individual estimates recovered, contrasts of these estimates can be calculated. This is also possible in the presence of Effect Modification by a third explanatory variable, where we would like to recover valid Causal estimates within levels of said variable.
For example, say we have binary Intervention A with support {0,1}, continuous Outcome Y, binary confounder C, and continuous Effect Modifier L. We can leverage Standardization or IPW to estimate the Mean Causal Effect of Intervention A=1 on Outcome Y within L=l, the Mean Causal Effect of Intervention A=0 on Outcome Y within L=l, and take the difference of these two estimates to recover an estimate of the Mean Causal Effect Difference of A=1 and A=0 on Outcome Y within level L=l:
In general (except in the case of fully saturated models), both Standardization and IPW require several parametric assumptions to hold:
- Standardization via the g-formula requires correct specification of the Outcome Model
- IPW via MSMs requires correct specification of the Intervention Model and Marginal Structural Model with Effect Modification
In my previous piece on Doubly Robust Estimation techniques, we examined how the above conditions can be somewhat “relaxed” by combining Standardization and IPW into a single sampling estimator. But depending on the purposes of our analysis and what parameters we have specific interest in estimating, there is another possible solution to relaxing the above assumptions.
Of the three Causal Parameters that we can recover with Standardization and IPW, what if we only care about the third estimate: the Mean Causal Effect Difference of Intervention a=1 and a=0 on Outcome Y within L=l. We in fact are not interested in recovering estimates of the first two mean parameters:
If we are only interested in recovering an estimate of the Difference of the Mean Causal Effects of two Intervention-Outcome relationships, and not the individual Intervention-Outcome relationships themselves, is there another methodological approach we can leverage other than Standardization or IPW? Perhaps a method that requires fewer parametric assumptions, and hence is less prone to parametric misspecification?
This brings us to our third and last category of G-Methods: G-Estimation of Structural Nested Models
Structural Nested Models are a class of semi-parametric models in the presence of Effect Modification; they leave part of the Outcome Model unspecified. Let’s construct a well-specified Toy Example, and examine how we can leverage Structural Nested Models.
Let us postulate a simple toy example:
We have binary Intervention A with support {0,1}, continuous Outcome Y, continuous variable C1, and binary variable C2. Figure 1 shows the Causal DAG (under the null) for this system of variables.
Note that:
- In the marginal Causal DAG above, Intervention A and Outcome Y are not marginally d-separated; there is confounding by binary variable C2 on the Marginal DAG.
- Note continuous variable C1; C1 is a direct cause of Outcome Y, but is not a cause of Intervention A (and therefore is not inducing confounding of the Intervention-Outcome relationship on the marginal DAG). However, given C1 is a direct cause of Y we suspect there is Effect Modification of the Intervention-Outcome relationship by C1.
Figure 2a-c shows the Causal Single World Intervention Graphs (SWIGs) (under the null) for the Causal DAG in Figure 1 under intervention a, a=1, and a=0.
What Parameter are we interested in recovering in this problem?
We are interested in recovering the Difference of the Mean Causal Effects of intervention a=1 and a=0 on outcome Y. However, given we believe there is Effect Modification by C1 (i.e., the Difference of Mean Causal Effects is different for differing values of C1), we would like to recover an unbiased estimate of the Mean Causal Effect Difference conditional within levels of C1:
Given we have established we would like to recover an unbiased estimate of the Difference of Mean Causal Effects of intervention a=1 and a=0 on outcome Y within levels of C1, let’s see how we can solve this with Standardization via the parametric g-formula, and the challenges in doing so.
Given the Causal DAG in Figure 1, under the assumption of consistency and positivity, via the Causal Markov Assumption the d-separation of A and Y conditional on C2 directly implies independence in distribution of A and Y conditional on C2, of which our observed data is a function of. Additionally, conditioning on C1 also leaves A and Y d-separated. Given we are interested in recovering a causal contrast within levels of C1, we will condition on C1 as well:
Note, that given this is a toy example, and we will be simulating a dataset from scratch to investigate this toy problem later, we can choose the parameterization for the “true” Outcome Model ourselves. For learning purposes, I have purposely chosen a complex Outcome Model parameterization (as shown below). The reasons why I chose something complex will become clear later. In practice, when we’re doing “real” analyses we do not have the luxury of simply “knowing” the true parameterization. Rather we best estimate what we think the true parameterization is using empirical analysis and our domain knowledge to guide us.
What are the challenges with this method?
In order to fit the above model, we need to know:
If we get either of these specifications incorrect, our model will be misspecified, and our recovered causal inferences estimates will likely be incorrect.
Given the Causal DAG in Figure 1, under the assumption of consistency and positivity, via the Causal Markov Assumption the d-separation of A and Y conditional on C2 directly implies independence in distribution of A and Y conditional on C2, of which our observed data is a function of.
We then reweight each observation in our observed dataset by their appropriate IPW. In this reweighted pseudo-population, we can fit the following Marginal Structural Model to recover the Mean Causal Effect of Outcome Y under Intervention a:
What are the challenges with this method?
In order to fit the above model, we need to know:
If we get either of these specifications incorrect, our model will be misspecified, and our recovered causal inferences likely incorrect.
Let’s recap where we currently are:
We are interested in recovering the Difference of the Mean Causal Effects of Intervention a=1 and a=0 on outcome Y within levels of C1=c1, i.e. we would like to recover:
Note, in both cases we needed to know:
A logical question is, what if we don’t know the true functional form for variable C1? What if we are not confident in our ability to model or specify it? This is where Structural Nested Models begin to take shape.
Let’s start by re-writing the Marginal Structural Model:
6.1: Structural Nested Model specification
Again, starting with the Marginal Structural Model:
Therefore, the Mean Causal Effect Difference of Intervention a=1 and a=0 within levels of C1 is:
6.2: Outcome Model specification
Here’s where the magic starts to take shape. Based on the above independence in distribution statements, we could specify the following Logistic Regression model:
There are two things of note about the above logistic regression model:
We are left with:
Let’s move to the final step; the fitting procedure for G-estimation.
6.3: Fitting Procedure with G-estimation
Let’s take stock of where we currently are:
With these pieces of the puzzle finally in-place, here is the G-estimation fitting procedure:
We’re going to conduct a computational simulation in Python to investigate the methods described in this piece, with particular attention paid to G-Estimation. We will:
- Create a simulated dataset with the true Causal DAG as shown below in Figure 3. Note this DAG is identical to Figure 1 but with a true Causal Effect of A on Y preset (i.e. a directed arrow from A to Y).
- Given we simulated the dataset and are “all knowing” of the true model specification and the true value of the Mean Causal Effect Difference of A on Y within levels of C1, we will attempt to recover an unbiased estimate of the Causal Effect Difference of A on Y within levels of C1 using Standardization via the parametric g-formula, and Inverse Probability Weighting via Marginal Structural Modeling, and G-estimation of Structural Nested Models. We will conduct these analyses under both correct and incorrect model specifications and review the results.
Let’s import our needed libraries: