pycea.tl.tree_distance#
- pycea.tl.tree_distance(tdata, depth_key='depth', obs=None, metric='path', sample_n=None, connect_key=None, random_state=None, key_added=None, update=True, tree=None, copy=False)#
- Overloads:
tdata (td.TreeData), depth_key (str), obs (str | int | Sequence[Any] | None), metric (_TreeMetric), sample_n (int | None), connect_key (str | None), random_state (int | None), key_added (str | None), update (bool), tree (str | Sequence[Any] | None), copy (Literal[True, False]) → sp.sparse.csr_matrix | np.ndarray
tdata (td.TreeData), depth_key (str), obs (str | int | Sequence[Any] | None), metric (_TreeMetric), sample_n (int | None), connect_key (str | None), random_state (int | None), key_added (str | None), update (bool), tree (str | Sequence[Any] | None), copy (Literal[True, False]) → None
Computes tree distances between observations.
This function calculates distances between observations based on their positions and depths in the tree. For
tdata.alignment == "leaves", this computes distances between leaf nodes. Fortdata.alignment == "nodes"or"subset", distances are computed between all observed nodes (leaves and internal nodes intdata.obs). It supports lowest common ancestor (lca) and path distances.Given two nodes and in a rooted tree, with depths and , and with their lowest common ancestor having depth :
represents the depth of the node’s shared ancestor (larger values indicate greater shared ancestry). In contrast, measures the distance along the tree between two nodes (smaller values indicate closer proximity).
- Parameters:
tdata (
TreeData) – The TreeData object.depth_key (
str(default:'depth')) – Attribute oftdata.obst[tree].nodeswhere depth is stored.obs (
str|int|Sequence[Any] |None(default:None)) –The observations to use:
If
None, pairwise distance for all observed nodes is stored intdata.obsp.If a string, distance to all other observed nodes is stored in
tdata.obs.If a sequence, pairwise distance is stored in
tdata.obsp.If a sequence of pairs, distance between pairs is stored in
tdata.obsp.
metric (
Literal['lca','path'] (default:'path')) –The type of tree distance to compute:
'lca': lowest common ancestor depth.'path': abs(node1 depth + node2 depth - 2 * lca depth).
sample_n (
int|None(default:None)) – If specified, randomly samplesample_npairs of observations.connect_key (
str|None(default:None)) – If specified, compute distances only between connected observations specified bytdata.obsp['{connect_key}_connectivities'].random_state (
int|None(default:None)) – Random seed for sampling.key_added (
str|None(default:None)) – Distances are stored intdata.obsp['{key_added}_distances']and connectivities intdata.obsp['{key_added}_connectivities']. Defaults to ‘tree’.update (
bool(default:True)) – If True, updates existing distances instead of overwriting.tree (
str|Sequence[Any] |None(default:None)) – Theobstkey or keys of the trees to use. IfNone, all trees are used.copy (
Literal[True,False] (default:False)) – If True, returns andarrayorcsr_matrixwith distances.
- Returns:
Returns
Noneifcopy=False, else returnsndarray/csr_matrix.Sets the following fields:
tdata.obsp['{key_added}_distances']ndarray/csr_matrix(dtypefloat) ifobsisNoneor a sequence.Distances between observations.
tdata.obsp['{key_added}_connectivities']csr_matrix(dtypefloat) if distance is sparse.Connectivity between observations.
tdata.obs['{key_added}_distances']Series(dtypefloat) ifobsis a string.Distance from specified observation to others.
Examples
Compute full pairwise path distances for tree leaves:
>>> tdata = py.datasets.koblan25() >>> py.tl.tree_distance(tdata, metric="path")
Sample 1000 random LCA distances using node ‘time’ as depth:
>>> py.tl.tree_distance(tdata, metric="lca", sample_n=1000, depth_key="time")