mvpa2.clfs.stats.match_distribution¶
-
mvpa2.clfs.stats.
match_distribution
(data, nsamples=None, loc=None, scale=None, args=None, test='kstest', distributions=None, **kwargs)¶ Determine best matching distribution.
Can be used for ‘smelling’ the data, as well to choose a parametric distribution for data obtained from non-parametric testing (e.g.
MCNullDist
).WiP: use with caution, API might change
Parameters: data : np.ndarray
Array of the data for which to deduce the distribution. It has to be sufficiently large to make a reliable conclusion
nsamples : int or None
If None – use all samples in data to estimate parametric distribution. Otherwise use only specified number randomly selected from data.
loc : float or None
Loc for the distribution (if known)
scale : float or None
Scale for the distribution (if known)
test : str
- What kind of testing to do. Choices:
- ‘p-roc’
detection power for a given ROC. Needs two parameters:
p=0.05
andtail='both'
- ‘kstest’
‘full-body’ distribution comparison. The best choice is made by minimal reported distance after estimating parameters of the distribution. Parameter
p=0.05
sets threshold to reject null-hypothesis that distribution is the same. WARNING: older versions (e.g. 0.5.2 in etch) of scipy have incorrect kstest implementation and do not function properly.
distributions : None or list of str or tuple(str, dict)
Distributions to check. If None, all known in scipy.stats are tested. If distribution is specified as a tuple, then it must contain name and additional parameters (name, loc, scale, args) in the dictionary. Entry ‘scipy’ adds all known in scipy.stats.
**kwargs
Additional arguments which are needed for each particular test (see above)
Examples
>>> from mvpa2.clfs.stats import match_distribution >>> data = np.random.normal(size=(1000,1)); >>> matches = match_distribution( ... data, ... distributions=['rdist', ... ('rdist', {'name':'rdist_fixed', ... 'loc': 0.0, ... 'args': (10,)})], ... nsamples=30, test='p-roc', p=0.05)