pycea.tl.tree_distance#
- pycea.tl.tree_distance(tdata, depth_key='depth', obs=None, metric='path', sample_n=None, connect_key=None, random_state=None, key_added=None, update=True, tree=None, copy=False)#
Computes tree distances between observations.
This function calculates distances between observations (typically tree leaves) based on their positions and depths in the tree. It supports lowest common ancestor (lca) and path distances.
Given two nodes \(i\) and \(j\) in a rooted tree, with depths \(d_i\) and \(d_j\), and with their lowest common ancestor having depth \(d_{LCA(i,j)}\):
\[D_{ij}^{lca} = d_{LCA(i,j)}\]\[D_{ij}^{path} = || d_i + d_j - 2 d_{LCA(i,j)} ||\]\(D_{ij}^{lca}\) represents the depth of the node’s shared ancestor (larger values indicate greater shared ancestry). In contrast, \(D_{ij}^{path}\) measures the distance along the tree between two nodes (smaller values indicate closer proximity).
- Parameters:
tdata (
TreeData) – The TreeData object.depth_key (
str(default:'depth')) – Attribute oftdata.obst[tree].nodeswhere depth is stored.obs (
str|int|Sequence[Any] |None(default:None)) –The observations to use:
If
None, pairwise distance for tree leaves is stored intdata.obsp.If a string, distance to all other tree leaves is
tdata.obs.If a sequence, pairwise distance is stored in
tdata.obsp.If a sequence of pairs, distance between pairs is stored in
tdata.obsp.
metric (
Literal['lca','path'] (default:'path')) –The type of tree distance to compute:
'lca': lowest common ancestor depth.'path': abs(node1 depth + node2 depth - 2 * lca depth).
sample_n (
int|None(default:None)) – If specified, randomly samplesample_npairs of observations.connect_key (
str|None(default:None)) – If specified, compute distances only between connected observations specified bytdata.obsp['{connect_key}_connectivities'].random_state (
int|None(default:None)) – Random seed for sampling.key_added (
str|None(default:None)) – Distances are stored intdata.obsp['{key_added}_distances']and connectivities intdata.obsp['{key_added}_connectivities']. Defaults to ‘tree’.update (
bool(default:True)) – If True, updates existing distances instead of overwriting.tree (
str|Sequence[Any] |None(default:None)) – Theobstkey or keys of the trees to use. IfNone, all trees are used.copy (
Literal[True,False] (default:False)) – If True, returns andarrayorcsr_matrixwith distances.
- Return type:
None|csr_matrix|ndarray- Returns:
Returns
Noneifcopy=False, else returnsndarray/csr_matrix.Sets the following fields:
tdata.obsp['{key_added}_distances']ndarray/csr_matrix(dtypefloat) ifobsisNoneor a sequence.Distances between observations.
tdata.obsp['{key_added}_connectivities']csr_matrix(dtypefloat) if distance is sparse.Connectivity between observations.
tdata.obs['{key_added}_distances']Series(dtypefloat) ifobsis a string.Distance from specified observation to others.
Examples
Compute full pairwise path distances for tree leaves:
>>> tdata = py.datasets.koblan25() >>> py.tl.tree_distance(tdata, metric="path")
Sample 1000 random LCA distances using node ‘time’ as depth:
>>> py.tl.tree_distance(tdata, metric="lca", sample_n=1000, depth_key="time")