pycea.tl.ancestral_states

pycea.tl.ancestral_states#

pycea.tl.ancestral_states(tdata, keys, method='mean', missing_state=None, default_state=None, costs=None, keys_added=None, tree=None, copy=False)#
Overloads:
  • tdata (td.TreeData), keys (str | Sequence[str]), method (str | Callable), missing_state (str | None), default_state (str | None), costs (pd.DataFrame | None), keys_added (str | Sequence[str] | None), tree (str | Sequence[str] | None), copy (Literal[True, False]) → pd.DataFrame

  • tdata (td.TreeData), keys (str | Sequence[str]), method (str | Callable), missing_state (str | None), default_state (str | None), costs (pd.DataFrame | None), keys_added (str | Sequence[str] | None), tree (str | Sequence[str] | None), copy (Literal[True, False]) → None

Reconstructs ancestral states for an attribute.

This function reconstructs ancestral (internal node) states for categorical or continuous attributes defined on tree observations. Several reconstruction methods are supported, ranging from simple aggregation rules to the Sankoff and Fitch-Hartigan algorithms for discrete character data, or a custom aggregation function can be provided.

For tdata.alignment == "leaves", only leaf node values are used as input and all internal node states are reconstructed. For tdata.alignment == "nodes" or "subset", internal nodes present in tdata.obs with non-missing values are treated as fixed constraints and are not overwritten by reconstruction.

Parameters:
  • tdata (TreeData) – TreeData object.

  • keys (str | Sequence[str]) – One or more obs.keys(), var_names, obsm.keys(), or obsp.keys() to reconstruct.

  • method (str | Callable (default: 'mean')) –

    Method to reconstruct ancestral states:

    • ’mean’ : The mean of leaves in subtree.

    • ’sum’ : The sum of leaves in subtree (iterative bottom-up traversal).

    • ’mode’ : The most common value in the subtree.

    • ’fitch_hartigan’ : The Fitch-Hartigan algorithm.

    • ’sankoff’ : The Sankoff algorithm with specified costs.

    • Any function that takes a list of values and returns a single value.

  • missing_state (str | None (default: None)) – The state to consider as missing data.

  • default_state (str | None (default: None)) – The expected state for the root node.

  • costs (DataFrame | None (default: None)) – A pd.DataFrame with the costs of changing states (from rows to columns). Only used if method is ‘sankoff’.

  • keys_added (str | Sequence[str] | None (default: None)) – Attribute keys of tdata.obst[tree].nodes where ancestral states will be stored. If None, keys are used.

  • tree (str | Sequence[str] | None (default: None)) – The obst key or keys of the trees to use. If None, all trees are used.

  • copy (Literal[True, False] (default: False)) – If True, returns a DataFrame with ancestral states.

Returns:

Returns None if copy=False, else return DataFrame with ancestral states.

Sets the following fields for each key:

  • tdata.obst[tree].nodes[key_added]float | Object | List[Object]
    • Inferred ancestral states. List of states if data was an array.

Examples

Infer the expression of Krt20 and Cd74 based on their mean value in descendant cells:

>>> tdata = py.datasets.yang22()
>>> py.tl.ancestral_states(tdata, keys=["Krt20", "Cd74"], method="mean")

Reconstruct ancestral character states using the Fitch-Hartigan algorithm:

>>> py.tl.ancestral_states(tdata, keys="characters", method="fitch_hartigan", missing_state=-1)