causalinference.utils package

causalinference.utils.tools module

causalinference.utils.tools.random_data(N=5000, K=3, unobservables=False, **kwargs)

Function that generates data according to one of two simple models that satisfies the unconfoundedness assumption.

The covariates and error terms are generated according to
X ~ N(mu, Sigma), epsilon ~ N(0, Gamma).
The counterfactual outcomes are generated by
Y0 = X*beta + epsilon_0, Y1 = delta + X*(beta+theta) + epsilon_1.
Selection is done according to the following propensity score function:
P(D=1|X) = Lambda(X*beta).

Here Lambda is the standard logistic CDF.

Parameters:

N: int

Number of units to draw. Defaults to 5000.

K: int

Number of covariates. Defaults to 3.

unobservables: bool

Returns potential outcomes and true propensity score in addition to observed outcome and covariates if True. Defaults to False.

mu, Sigma, Gamma, beta, delta, theta: NumPy ndarrays, optional

Parameter values appearing in data generating process.

Returns:

tuple

A tuple in the form of (Y, D, X) or (Y, D, X, Y0, Y1) of observed outcomes, treatment indicators, covariate matrix, and potential outomces.