pycea.tl.tree_neighbors#
- pycea.tl.tree_neighbors(tdata, n_neighbors=None, max_dist=None, depth_key='depth', obs=None, metric='path', random_state=None, key_added='tree', update=True, tree=None, copy=False)#
- Overloads:
tdata (td.TreeData), n_neighbors (int | None), max_dist (float | None), depth_key (str), obs (str | Sequence[str] | None), metric (_TreeMetric), random_state (int | None), key_added (str), update (bool), tree (str | Sequence[str] | None), copy (Literal[True, False]) → tuple[sp.sparse.csr_matrix, sp.sparse.csr_matrix]
tdata (td.TreeData), n_neighbors (int | None), max_dist (float | None), depth_key (str), obs (str | Sequence[str] | None), metric (_TreeMetric), random_state (int | None), key_added (str), update (bool), tree (str | Sequence[str] | None), copy (Literal[True, False]) → None
Identifies neighbors in the tree.
For each observation, this function identifies neighbors according to a chosen tree distance
metricand either:the top-
n_neighborsclosest observations (ties broken at random)all observations within a distance threshold
max_dist.
Results are stored as sparse connectivities and distances, or returned when
copy=True. You can restrict the operation to a subset of observations viaobsand/or to specific trees viatree.For
tdata.alignment == "leaves", only leaf nodes are considered as neighbors. Fortdata.alignment == "nodes"or"subset", all observed nodes (leaves and internal nodes present intdata.obs) are considered as neighbors.- Parameters:
tdata (
TreeData) – The TreeData object.n_neighbors (
int|None(default:None)) – The number of neighbors to identify for each leaf. Ties are broken randomly.max_dist (
float|None(default:None)) – If n_neighbors is None, identify all neighbors within this distance.depth_key (
str(default:'depth')) – Attribute oftdata.obst[tree].nodeswhere depth is stored.obs (
str|Sequence[str] |None(default:None)) –The observations to use:
If
None, neighbors for all observed nodes are stored intdata.obsp.If a string, neighbors of specified observation are stored in
tdata.obs.If a sequence, neighbors within specified observations are stored in
tdata.obsp.
metric (
Literal['lca','path'] (default:'path')) –The type of tree distance to compute:
'lca': lowest common ancestor depth.'path': abs(node1 depth + node2 depth - 2 * lca depth).
random_state (
int|None(default:None)) – Random seed for breaking ties.key_added (
str(default:'tree')) – Neighbor distances are stored intdata.obsp['{key_added}_distances']and neighbors in .obsp[‘{key_added}_connectivities’]. Defaults to ‘tree’.update (
bool(default:True)) – If True, updates existing distances instead of overwriting.tree (
str|Sequence[str] |None(default:None)) – Thetdata.obstkey or keys of the trees to use. IfNone, all trees are used.copy (
Literal[True,False] (default:False)) – If True, returns a tuple of connectivities and distances.
- Returns:
Returns
Noneifcopy=False, else returns (connectivities, distances).Sets the following fields:
tdata.obsp['{key_added}_distances']csr_matrix(dtypefloat) ifobsisNoneor a sequence.Distances to neighboring observations.
tdata.obsp['{key_added}_connectivities']csr_matrix(dtypefloat) if distance is sparse.Set of neighbors for each observation.
tdata.obs['{key_added}_neighbors']Series(dtypebool) ifobsis a string.Set of neighbors for specified observation.
Examples
Identify the 5 closest neighbors for each leaf based on path distance:
>>> tdata = py.datasets.koblan25() >>> py.tl.tree_neighbors(tdata, n_neighbors=5, depth_key="time")