pycea.tl.compare_distance#
- pycea.tl.compare_distance(tdata, dist_keys=None, sample_n=None, groupby=None, groups=None, random_state=None)#
Get pairwise observation distances.
This function gathers distances between the same observation pairs from one or more entries in
tdata.obspand returns them side-by-side in a tidypandas.DataFrame. Only pairs for which all requested distance matrices have defined values are included. Optionally, comparisons can be restricted within groups and/or randomly subsampled.- Parameters:
tdata (
TreeData) – The TreeData object.dist_key – One or more
tdata.obspdistance keys to compare. Only pairs where all distances are available are returned.sample_n (
int|None(default:None)) – If specified, randomly samplesample_npairs of observations. If groupby is specified, the sample is taken within each group.groupby (
str|None(default:None)) – If specified, only compare distances within groups.groups (
str|Sequence[str] |None(default:None)) – Restrict the comparison to these groups.random_state (
int|None(default:None)) – Random seed for sampling.
- Return type:
- Returns:
Returns a
DataFramewith the following columns:obs1andobs2are the observation names.{dist_key}_distancesare the distances between the observations.
Examples
Compare spatial and tree distances for 1000 random pairs of observations:
>>> tdata = py.datasets.koblan25() >>> py.tl.distance(tdata, key="spatial", sample_n=1000) >>> py.tl.tree_distance(tdata, key="tree", connect_key="spatial_connectivities") >>> df = py.tl.compare_distance(tdata, dist_keys=["spatial_distances", "tree_distances"])