FCIT

class hyppo.conditional.FCIT(model=DecisionTreeRegressor(), cv_grid={'min_samples_split': [2, 8, 64, 512, 0.01, 0.2, 0.4]}, num_perm=8, prop_test=0.1, discrete=(False, False))

Fast Conditional Independence test statistic and p-value

The Fast Conditional Independence Test is a non-parametric conditional independence test 1.

Parameters
  • model (Sklearn regressor) -- Regressor used to predict input data Y

  • cv_grid (dict) -- Dictionary of parameters to cross-validate over when training regressor.

  • num_perm (int) -- Number of data permutations to estimate the p-value from marginal stats.

  • prop_test (float) -- Proportion of data to evaluate test stat on.

  • discrete (tuple of string) -- Whether X or Y are discrete

Notes

Note

This algorithm is currently a pre-print on arXiv.

The motivation for the test rests on the assumption that if XYZ, then Y should be more accurately predicted by using both X and Z as covariates as opposed to only using Z as a covariate. Likewise, if XYZ, then Y should be predicted just as accurately by solely using X or soley using Z 1. Thus, the test works by using a regressor (the default is decision tree) to to predict input Y using both X and Z and using only Z 1. Then, accuracy of both predictions are measured via mean-squared error (MSE). XYZ if and only if MSE of the algorithm using both X and Z is not smaller than the MSE of the algorithm trained using only Z 1.

References

1(1,2,3,4)

Krzysztof Chalupka, Pietro Perona, and Frederick Eberhardt. Fast conditional independence test for vector variables with large sample sizes. arXiv:1804.02747 [math, stat], 2018.

Methods Summary


FCIT.statistic(x, y, z=None)

Calculates the FCIT test statistic.

Parameters

x,y,z (ndarray of float) -- Input data matrices.

Returns

  • stat (float) -- The computed FCIT test statistic.

  • two_sided (float) -- Two-sided p-value associated with test statistic

FCIT.test(x, y, z=None)

Calculates the FCIT test statistic and p-value.

Parameters

x,y,z (ndarray of float) -- Input data matrices.

Returns

  • stat (float) -- The computed FCIT statistic.

  • pvalue (float) -- The computed FCIT p-value.

Examples

>>>
>>> import numpy as np
>>> from hyppo.conditional import FCIT
>>> from sklearn.tree import DecisionTreeRegressor
>>> np.random.seed(1234)
>>> dim = 2
>>> n = 100000
>>> z1 = np.random.multivariate_normal(mean=np.zeros(dim), cov=np.eye(dim), size=(n))
>>> A1 = np.random.normal(loc=0, scale=1, size=dim * dim).reshape(dim, dim)
>>> B1 = np.random.normal(loc=0, scale=1, size=dim * dim).reshape(dim, dim)
>>> x1 = (A1 @ z1.T + np.random.multivariate_normal(mean=np.zeros(dim), cov=np.eye(dim), size=(n)).T)
>>> y1 = (B1 @ z1.T + np.random.multivariate_normal(mean=np.zeros(dim), cov=np.eye(dim), size=(n)).T)
>>> model = DecisionTreeRegressor()
>>> cv_grid = {"min_samples_split": [2, 8, 64, 512, 1e-2, 0.2, 0.4]}
>>> stat, pvalue = FCIT(model=model, cv_grid=cv_grid).test(x1.T, y1.T, z1)
>>> '%.1f, %.2f' % (stat, pvalue)
'-3.6, 1.00'

Examples using hyppo.conditional.FCIT