pycea.tl.tree_neighbors

Contents

pycea.tl.tree_neighbors#

pycea.tl.tree_neighbors(tdata, n_neighbors=None, max_dist=None, depth_key='depth', obs=None, metric='path', random_state=None, key_added='tree', update=True, tree=None, copy=False)#
Overloads:
  • tdata (td.TreeData), n_neighbors (int | None), max_dist (float | None), depth_key (str), obs (str | Sequence[str] | None), metric (_TreeMetric), random_state (int | None), key_added (str), update (bool), tree (str | Sequence[str] | None), copy (Literal[True, False]) → tuple[sp.sparse.csr_matrix, sp.sparse.csr_matrix]

  • tdata (td.TreeData), n_neighbors (int | None), max_dist (float | None), depth_key (str), obs (str | Sequence[str] | None), metric (_TreeMetric), random_state (int | None), key_added (str), update (bool), tree (str | Sequence[str] | None), copy (Literal[True, False]) → None

Identifies neighbors in the tree.

For each observation, this function identifies neighbors according to a chosen tree distance metric and either:

  • the top-n_neighbors closest observations (ties broken at random)

  • all observations within a distance threshold max_dist.

Results are stored as sparse connectivities and distances, or returned when copy=True. You can restrict the operation to a subset of observations via obs and/or to specific trees via tree.

For tdata.alignment == "leaves", only leaf nodes are considered as neighbors. For tdata.alignment == "nodes" or "subset", all observed nodes (leaves and internal nodes present in tdata.obs) are considered as neighbors.

Parameters:
  • tdata (TreeData) – The TreeData object.

  • n_neighbors (int | None (default: None)) – The number of neighbors to identify for each leaf. Ties are broken randomly.

  • max_dist (float | None (default: None)) – If n_neighbors is None, identify all neighbors within this distance.

  • depth_key (str (default: 'depth')) – Attribute of tdata.obst[tree].nodes where depth is stored.

  • obs (str | Sequence[str] | None (default: None)) –

    The observations to use:

    • If None, neighbors for all observed nodes are stored in tdata.obsp.

    • If a string, neighbors of specified observation are stored in tdata.obs.

    • If a sequence, neighbors within specified observations are stored in tdata.obsp.

  • metric (Literal['lca', 'path'] (default: 'path')) –

    The type of tree distance to compute:

    • 'lca': lowest common ancestor depth.

    • 'path': abs(node1 depth + node2 depth - 2 * lca depth).

  • random_state (int | None (default: None)) – Random seed for breaking ties.

  • key_added (str (default: 'tree')) – Neighbor distances are stored in tdata.obsp['{key_added}_distances'] and neighbors in .obsp[‘{key_added}_connectivities’]. Defaults to ‘tree’.

  • update (bool (default: True)) – If True, updates existing distances instead of overwriting.

  • tree (str | Sequence[str] | None (default: None)) – The tdata.obst key or keys of the trees to use. If None, all trees are used.

  • copy (Literal[True, False] (default: False)) – If True, returns a tuple of connectivities and distances.

Returns:

Returns None if copy=False, else returns (connectivities, distances).

Sets the following fields:

  • tdata.obsp['{key_added}_distances']csr_matrix (dtype float) if obs is None or a sequence.
    • Distances to neighboring observations.

  • tdata.obsp['{key_added}_connectivities']csr_matrix (dtype float) if distance is sparse.
    • Set of neighbors for each observation.

  • tdata.obs['{key_added}_neighbors']Series (dtype bool) if obs is a string.
    • Set of neighbors for specified observation.

Examples

Identify the 5 closest neighbors for each leaf based on path distance:

>>> tdata = py.datasets.koblan25()
>>> py.tl.tree_neighbors(tdata, n_neighbors=5, depth_key="time")