pycea.tl.distance#
- pycea.tl.distance(tdata, key, obs=None, metric='euclidean', metric_kwds=None, sample_n=None, connect_key=None, random_state=None, update=True, key_added=None, copy=False)#
- Overloads:
tdata (td.TreeData), key (str), obs (str | int | Sequence[Any] | None), metric (_MetricFn | _Metric), metric_kwds (Mapping | None), sample_n (int | None), connect_key (str | None), random_state (int | None), update (bool), key_added (str | None), copy (Literal[True, False]) → np.ndarray | sp.sparse.csr_matrix
tdata (td.TreeData), key (str), obs (str | int | Sequence[Any] | None), metric (_MetricFn | _Metric), metric_kwds (Mapping | None), sample_n (int | None), connect_key (str | None), random_state (int | None), update (bool), key_added (str | None), copy (Literal[True, False]) → None
Computes distances between observations.
Supports full pairwise distances, distances from a single observation to all others, distances within a specified subset, or distances for an explicit list of pairs. Distances can be computed using a named metric (e.g.
"euclidean","cosine","manhattan") or a user-supplied callable.- Parameters:
tdata (
TreeData) – The TreeData object.key (
str) – Use the indicated key.'X'or anytdata.obsmkey is valid.obs (
str|int|Sequence[Any] |None(default:None)) –The observations to use:
If
None, pairwise distance for all observations is stored intdata.obsp.If a string, distance to all other observations is
tdata.obs.If a sequence, pairwise distance is stored in
tdata.obsp.If a sequence of pairs, distance between pairs is stored in
tdata.obsp.
metric (
Union[Callable[[ndarray,ndarray],float],Literal['braycurtis','canberra','chebyshev','cityblock','cosine','correlation','dice','euclidean','hamming','jaccard','kulsinski','l1','l2','mahalanobis','minkowski','manhattan','rogerstanimoto','russellrao','seuclidean','sokalmichener','sokalsneath','sqeuclidean','yule']] (default:'euclidean')) – A known metric’s name or a callable that returns a distance.metric_kwds (
Mapping|None(default:None)) – Options for the metric.sample_n (
int|None(default:None)) – If specified, randomly samplesample_npairs of observations.connect_key (
str|None(default:None)) – If specified, compute distances only between connected observations specified bytdata.obsp[{connect_key}_connectivities].random_state (
int|None(default:None)) – Random seed for sampling.key_added (
str|None(default:None)) – Distances are stored intdata.obsp['{key_added}_distances']and connectivities intdata.obsp['{key_added}_connectivities']. Defaults tokey.update (
bool(default:True)) – If True, updates existing distances instead of overwriting.copy (
Literal[True,False] (default:False)) – If True, returns a the distances.
- Returns:
Returns
Noneifcopy=False, else returns distances.Sets the following fields:
tdata.obsp['{key_added}_distances']ndarray/csr_matrix(dtypefloat) ifobsisNoneor a sequence.Distances between observations.
tdata.obsp['{key_added}_connectivities']csr_matrix(dtypefloat) if distance is sparse.Connectivity between observations.
tdata.obs['{key_added}_distances']Series(dtypefloat) ifobsis a string.Distance from specified observation to others.
Notes
When both
connect_keyandsample_nare provided, sampling is performed within the connected pairs induced by the connectivity.If you pass a callable metric, it must accept two 1D vectors and return a scalar.
Examples
Calculate pairwise spatial distance between all observations:
>>> tdata = py.datasets.koblan25() >>> py.tl.distance(tdata, key="spatial")
Calculate spatial distance between closely related observations:
>>> py.tl.tree_neighbors(tdata, n_neighbors=10, depth_key="time") >>> py.tl.distance(tdata, key="spatial", connect_key="tree_connectivities")
Calculate distance from a single observation to all others:
>>> py.tl.distance(tdata, key="spatial", obs="M3-1-19")