FCIT¶
- class hyppo.conditional.FCIT(model=DecisionTreeRegressor(), cv_grid={'min_samples_split': [2, 8, 64, 512, 0.01, 0.2, 0.4]}, num_perm=8, prop_test=0.1, discrete=(False, False))¶
Fast Conditional Independence test statistic and p-value
The Fast Conditional Independence Test is a non-parametric conditional independence test 1.
- Parameters
model (
Sklearn regressor
) -- Regressor used to predict input datacv_grid (
dict
) -- Dictionary of parameters to cross-validate over when training regressor.num_perm (
int
) -- Number of data permutations to estimate the p-value from marginal stats.prop_test (
float
) -- Proportion of data to evaluate test stat on.discrete (
tuple
ofstring
) -- Whether or are discrete
Notes
Note
This algorithm is currently a pre-print on arXiv.
The motivation for the test rests on the assumption that if
, then should be more accurately predicted by using both and as covariates as opposed to only using as a covariate. Likewise, if , then should be predicted just as accurately by solely using or soley using 1. Thus, the test works by using a regressor (the default is decision tree) to to predict input using both and and using only 1. Then, accuracy of both predictions are measured via mean-squared error (MSE). if and only if MSE of the algorithm using both and is not smaller than the MSE of the algorithm trained using only 1.References
Methods Summary
- FCIT.statistic(x, y, z=None)¶
Calculates the FCIT test statistic.
- FCIT.test(x, y, z=None)¶
Calculates the FCIT test statistic and p-value.
- Parameters
- Returns
Examples
>>> import numpy as np >>> from hyppo.conditional import FCIT >>> from sklearn.tree import DecisionTreeRegressor >>> np.random.seed(1234) >>> dim = 2 >>> n = 100000 >>> z1 = np.random.multivariate_normal(mean=np.zeros(dim), cov=np.eye(dim), size=(n)) >>> A1 = np.random.normal(loc=0, scale=1, size=dim * dim).reshape(dim, dim) >>> B1 = np.random.normal(loc=0, scale=1, size=dim * dim).reshape(dim, dim) >>> x1 = (A1 @ z1.T + np.random.multivariate_normal(mean=np.zeros(dim), cov=np.eye(dim), size=(n)).T) >>> y1 = (B1 @ z1.T + np.random.multivariate_normal(mean=np.zeros(dim), cov=np.eye(dim), size=(n)).T) >>> model = DecisionTreeRegressor() >>> cv_grid = {"min_samples_split": [2, 8, 64, 512, 1e-2, 0.2, 0.4]} >>> stat, pvalue = FCIT(model=model, cv_grid=cv_grid).test(x1.T, y1.T, z1) >>> '%.1f, %.2f' % (stat, pvalue) '-3.6, 1.00'