pycea.tl.ancestral_states

pycea.tl.ancestral_states#

pycea.tl.ancestral_states(tdata, keys, method='mean', missing_state=None, default_state=None, costs=None, keys_added=None, tree=None, copy=False)#

Reconstructs ancestral states for an attribute.

This function reconstructs ancestral (internal node) states for categorical or continuous attributes defined on tree leaves. Several reconstruction methods are supported, ranging from simple aggregation rules to the Sankoff and Fitch-Hartigan algorithms for discrete character data, or a custom aggregation function can be provided.

Parameters:
  • tdata (TreeData) – TreeData object.

  • keys (str | Sequence[str]) – One or more obs.keys(), var_names, obsm.keys(), or obsp.keys() to reconstruct.

  • method (str | Callable (default: 'mean')) –

    Method to reconstruct ancestral states:

    • ’mean’ : The mean of leaves in subtree.

    • ’mode’ : The most common value in the subtree.

    • ’fitch_hartigan’ : The Fitch-Hartigan algorithm.

    • ’sankoff’ : The Sankoff algorithm with specified costs.

    • Any function that takes a list of values and returns a single value.

  • missing_state (str | None (default: None)) – The state to consider as missing data.

  • default_state (str | None (default: None)) – The expected state for the root node.

  • costs (DataFrame | None (default: None)) – A pd.DataFrame with the costs of changing states (from rows to columns). Only used if method is ‘sankoff’.

  • keys_added (str | Sequence[str] | None (default: None)) – Attribute keys of tdata.obst[tree].nodes where ancestral states will be stored. If None, keys are used.

  • tree (str | Sequence[str] | None (default: None)) – The obst key or keys of the trees to use. If None, all trees are used.

  • copy (Literal[True, False] (default: False)) – If True, returns a DataFrame with ancestral states.

Return type:

DataFrame | None

Returns:

Returns None if copy=False, else return DataFrame with ancestral states.

Sets the following fields for each key:

  • tdata.obst[tree].nodes[key_added]float | Object | List[Object]
    • Inferred ancestral states. List of states if data was an array.

Examples

Infer the expression of Krt20 and Cd74 based on their mean value in descendant cells:

>>> tdata = py.datasets.yang22()
>>> py.tl.ancestral_states(tdata, keys=["Krt20", "Cd74"], method="mean")

Reconstruct ancestral character states using the Fitch-Hartigan algorithm:

>>> py.tl.ancestral_states(tdata, keys="characters", method="fitch_hartigan", missing_state=-1)