pycea.tl.autocorr

Contents

pycea.tl.autocorr#

pycea.tl.autocorr(tdata, keys=None, connect_key='tree_connectivities', method='moran', layer=None, copy=False)#
Overloads:
  • tdata (td.TreeData), keys (str | Sequence[str] | None), connect_key (str), method (str), layer (str | None), copy (Literal[True, False]) → pd.DataFrame

  • tdata (td.TreeData), keys (str | Sequence[str] | None), connect_key (str), method (str), layer (str | None), copy (Literal[True, False]) → None

Calculate autocorrelation statistic.

This function computes autocorrelation for one or more variables using either Moran’s I or Geary’s C statistic, based on a specified connectivity graph between observations.

Mathematically, the two statistics are defined as follows:

I=Ni,jwi,j(xixˉ)(xjxˉ)Wi(xixˉ)2C=(N1)i,jwi,j(xixj)22Wi(xixˉ)2I = \frac{ N \sum_{i,j} w_{i,j} (x_i - \bar{x})(x_j - \bar{x}) }{ W \sum_i (x_i - \bar{x})^2 } C = \frac{ (N - 1)\sum_{i,j} w_{i,j} (x_i - x_j)^2 }{ 2W \sum_i (x_i - \bar{x})^2 }
where:
  • NN is the number of observations,

  • xix_i is the value of observation i,

  • xˉ\bar{x} is the mean of all observations,

  • wi,jw_{i,j} is the spatial weight between i and j, and

  • W=i,jwi,jW = \sum_{i,j} w_{i,j}.

A Moran’s I value close to 1 indicates strong positive autocorrelation, while values near 0 suggest randomness. For Geary’s C behaves inversely: values less than 1 indicate positive autocorrelation, while values greater than 1 indicate negative autocorrelation.

Parameters:
  • tdata (TreeData) – TreeData object.

  • keys (str | Sequence[str] | None (default: None)) – One or more obs.keys(), var_names, obsm.keys(), or obsp.keys() to calculate autocorrelation for. Defaults to all ‘var_names’.

  • connect_key (str (default: 'tree_connectivities')) – tdata.obsp connectivity key specifying set of neighbors for each observation.

  • method (str (default: 'moran')) –

    Method to calculate autocorrelation. Options are:

  • layer (str | None (default: None)) – Name of the TreeData object layer to use. If None, tdata.X is used.

  • copy (Literal[True, False] (default: False)) – If True, returns a DataFrame with autocorrelation.

Returns:

Returns None if copy=False, else returns DataFrame with columns:

  • 'autocorr' - Moran’s I or Geary’s C statistic.

  • 'pval_norm' - p-value under normality assumption.

  • 'var_norm' - variance of 'score' under normality assumption.

Sets the following fields for each key:

  • tdata.uns['moranI'] : Above DataFrame for if method is 'moran'.

  • tdata.uns['gearyC'] : Above DataFrame for if method is 'geary'.

Examples

Estimate gene expression heritability using Moran’s I autocorrelation:

>>> tdata = py.datasets.yang22()
>>> py.tl.tree_neighbors(tdata, n_neighbors=10)
>>> py.tl.autocorr(tdata, connect_key="tree_connectivities", method="moran")