pycea.tl.compare_distance

pycea.tl.compare_distance#

pycea.tl.compare_distance(tdata, dist_keys=None, sample_n=None, groupby=None, groups=None, random_state=None)#

Get pairwise observation distances.

This function gathers distances between the same observation pairs from one or more entries in tdata.obsp and returns them side-by-side in a tidy pandas.DataFrame. Only pairs for which all requested distance matrices have defined values are included. Optionally, comparisons can be restricted within groups and/or randomly subsampled.

Parameters:
  • tdata (TreeData) – The TreeData object.

  • dist_key – One or more tdata.obsp distance keys to compare. Only pairs where all distances are available are returned.

  • sample_n (int | None (default: None)) – If specified, randomly sample sample_n pairs of observations. If groupby is specified, the sample is taken within each group.

  • groupby (str | None (default: None)) – If specified, only compare distances within groups.

  • groups (str | Sequence[str] | None (default: None)) – Restrict the comparison to these groups.

  • random_state (int | None (default: None)) – Random seed for sampling.

Return type:

DataFrame

Returns:

Returns a DataFrame with the following columns:

  • obs1 and obs2 are the observation names.

  • {dist_key}_distances are the distances between the observations.

Examples

Compare spatial and tree distances for 1000 random pairs of observations:

>>> tdata = py.datasets.koblan25()
>>> py.tl.distance(tdata, key="spatial", sample_n=1000)
>>> py.tl.tree_distance(tdata, key="tree", connect_key="spatial_connectivities")
>>> df = py.tl.compare_distance(tdata, dist_keys=["spatial_distances", "tree_distances"])