causalinference.utils package¶

causalinference.utils.tools module¶

causalinference.utils.tools.random_data(N=5000, K=3, unobservables=False, **kwargs)¶

Function that generates data according to one of two simple models that satisfies the unconfoundedness assumption.

The covariates and error terms are generated according to: X ~ N(mu, Sigma), epsilon ~ N(0, Gamma).
The counterfactual outcomes are generated by: Y0 = X*beta + epsilon_0, Y1 = delta + X*(beta+theta) + epsilon_1.
Selection is done according to the following propensity score function:: P(D=1|X) = Lambda(X*beta).

Here Lambda is the standard logistic CDF.

Parameters:

N: int: Number of units to draw. Defaults to 5000.
K: int: Number of covariates. Defaults to 3.
unobservables: bool: Returns potential outcomes and true propensity score in addition to observed outcome and covariates if True. Defaults to False.
mu, Sigma, Gamma, beta, delta, theta: NumPy ndarrays, optional: Parameter values appearing in data generating process.

Returns:

tuple: A tuple in the form of (Y, D, X) or (Y, D, X, Y0, Y1) of observed outcomes, treatment indicators, covariate matrix, and potential outomces.