pycea.tl.partition_test#
- pycea.tl.partition_test(tdata, keys, comparison='siblings', test='permutation', aggregate='mean', metric='mean_difference', metric_kwds=None, n_permutations=100, random_state=None, equal_var=True, min_group_leaves=10, keys_added=None, tree=None, copy=True)#
- Overloads:
tdata (td.TreeData), keys (str | Sequence[str]), comparison (Literal[‘siblings’, ‘rest’]), test (Literal[‘permutation’, ‘t-test’] | None), aggregate (_AggregatorFn | _Aggregator), metric (_MetricFn | _Metric | Literal[‘mean_difference’]), metric_kwds (Mapping | None), n_permutations (int), random_state (int | None), equal_var (bool), min_group_leaves (int), keys_added (str | Sequence[str] | None), tree (str | Sequence[str] | None), copy (Literal[True, False]) → pd.DataFrame
tdata (td.TreeData), keys (str | Sequence[str]), comparison (Literal[‘siblings’, ‘rest’]), test (Literal[‘permutation’, ‘t-test’] | None), aggregate (_AggregatorFn | _Aggregator), metric (_MetricFn | _Metric | Literal[‘mean_difference’]), metric_kwds (Mapping | None), n_permutations (int), random_state (int | None), equal_var (bool), min_group_leaves (int), keys_added (str | Sequence[str] | None), tree (str | Sequence[str] | None), copy (Literal[True, False]) → None
Test for differences between leaf partitions.
For each requested observation key, this function compares the set of leaves descended from each internal node (group1) to the set of leaves defined by the
comparisonparameter (group2):comparison='siblings':Compare to the descendants of sibling nodes. When there is more than one sibling (i.e., a non-binary split), each child node is compared individually to the pooled set of all other siblings.
comparison='rest':Compare to all other leaves in the tree not descended from the given node.
The
testparameter defines how the two groups are compared:test='permutation':a two-sided permutation test is performed by repeatedly shuffling the pooled rows (group1 + group2), applying the
aggregatefunction, and then recomputing the split statistic using themetricfunction. The number of permutations executed is the minimum of the user-requestedn_permutationsand the theoretical maximum number of distinct labelings (comb(n_left + n_right, n_left)). The p-value is computed with standard +1 smoothing:
test='test-t':a two-sided t-test is performed for each group. Note that for small numbers of leaves the p-value of this t-test can be unreliable.
test=None:no statistical test is performed; only the partition statistic is computed.
P-values are calculated as long as both groups have at least
min_group_leavesleaves; otherwise, no test is performed for that partition and the p-value is set to NaN.- Parameters:
tdata (
TreeData) – TreeData object.keys (
str|Sequence[str]) – One or moreobs.keys(),var_names,obsm.keys(), orobsp.keys()to reconstruct.comparison (
Literal['siblings','rest'] (default:'siblings')) –Set of leaves to compare to:
’siblings’ : leaves descending from a given node are compared to leaves descending from its siblings.
’rest’ : leaves descending from a given node are compared to all other leaves of the tree.
test (
Optional[Literal['permutation','t-test']] (default:'permutation')) – Type of test to perform to compare the two groups. “t-test” can only be used for scalar keys.aggregate (
Union[Callable[[ndarray],ndarray|float],Literal['mean','median','sum','min','max','var']] (default:'mean')) – Function to reduce the data from all the leaves of a given group to a vector or scalar. Can be a known aggregator or a callable. Only used for test=”permutation”.metric (
Union[Callable[[ndarray,ndarray],float],Literal['braycurtis','canberra','chebyshev','cityblock','cosine','correlation','dice','euclidean','hamming','jaccard','kulsinski','l1','l2','mahalanobis','minkowski','manhattan','rogerstanimoto','russellrao','seuclidean','sokalmichener','sokalsneath','sqeuclidean','yule'],Literal['mean_difference']] (default:'mean_difference')) – A metric to compare the children from both sides of the tree. Can be a known metric or a callable. Only used for test=”permutation”.metric_kwds (
Mapping|None(default:None)) – Options for the metric.equal_var (
bool(default:True)) – Boolean indicating if the variance in the two groups should be assumed to be equal. Only used for test=”t-test”.n_permutations (
int(default:100)) – Upper bound on the number of permutations to run. The actually executed number ismin(n_permutations, comb(n_left + n_right, n_left))per group.random_state (
int|None(default:None)) – Random seed to ensure reproducibility of permutation test.min_group_leaves (
int(default:10)) – Minimum number of leaves required in each group to perform a statistical test. The t-test may be particularly unreliable with small sample sizes.keys_added (
str|Sequence[str] |None(default:None)) – Attribute keys oftdata.obst[tree].nodeswhere group statistics will be stored. IfNone,keysare used.tree (
str|Sequence[str] |None(default:None)) – Theobstkey or keys of the trees to use. IfNone, all trees are used.copy (
Literal[True,False] (default:True)) – If True, returns aDataFramewith group statistics.
- Returns:
- Returns
Noneifcopy=False, else returnsDataFramewith columns: 'tree'- Tree name.'key'- Observation key.'parent'- Parent of group1 node.'group1'- Node defining group1 leaf set.'group2'- Node(s) defining group2 leaf set or “rest”.'value1'- Aggregate leaf value forgroup1.'value2'- Aggregate leaf value forgroup2.'pval'- p-value from the statistical test (if performed).
Sets the following fields:
tdata.obst[tree].nodes[f"{key_added}_value"]float/ndarrayAggregate value of leaves descended from that node.
tdata.obst[tree].edges[f"{key_added}_pval"]floatP-value for the partition test at that edge (if performed).
tdata.obst[tree].edges[f"{key_added}_metric"]floatMetric value for the partition at that edge (only if test=”permutation”).
- Returns
Examples
Identify clades with the highest expression of “elt-2”:
>>> tdata = py.datasets.packer19() >>> py.tl.partition_test(tdata, keys=["elt-2"], test="t-test", comparison="rest")