Plotting trees#

Pycea implements an intuitive tree plotting language where complex plots can be built from simple components:

In this tutorial, we will use C. elegans data from Packer, et al. 2019 to demonstrate the features of these functions and how they can be combined. This dataset contains the C. elegans lineage tree up to 400 minutes post fertilization, roughly corresponding to the Coma stage, as well as transcriptomic data from scRNA-seq aligned to the tree and spatial coordinates from live cell imaging.

import pycea as py
import scanpy as sc
import matplotlib.pyplot as plt

plt.rcParams["figure.figsize"] = (5, 5)

The C. elegans data can be easily loaded using pycea.datasets.packer19(). For this tutorial, we will subset the tree to only include lineages where transcriptomic data is available.

tdata = py.datasets.packer19(tree="observed")
tdata
TreeData object with n_obs × n_vars = 988 × 20222
    obs: 'birth_time', 'dies', 'annotation_name', 'clade', 'generation_within_clade', 'parent_cell', 'lineage_group', 'umap_cluster', 'cells_produced', 'x', 'y', 'z', 'time', 'tree'
    var: 'gene_id'
    obsm: 'spatial'
    obst: 'tree'

Plotting branches#

The first step is rendering the branches of the tree using pycea.pl.branches(). The depth_key parameter specifies the node attribute where the depth is stored. Here we with use depth = 'time' which contains precise division times from live cell imaging.

py.pl.branches(tdata, depth_key="time");
../_images/6751b31209757f2e143a860e8e99dc657c2891a27df67cb2792ae754ecaa0295.png

The way branches are plotted is highly customizable. For example, we can specify angled_branches with polar coordinates and then color the branches by clade.

py.pl.branches(tdata, depth_key="time", angled_branches=True, polar=True, color="clade");
../_images/eb5f524a9507e43d544f6279ceb141f026ef19e1474c71fec3c06a0addc0c85e.png

Plotting nodes#

Once branches are plotted, we can render the nodes of the tree using pycea.pl.nodes(). Similar to the branches, we can use the color parameter to specify how the nodes should be colored. The slot parameter specifies which field of the tdata object contains the color information.

py.pl.branches(tdata, depth_key="time")
py.pl.nodes(tdata, color="clade", slot="obst");
../_images/dbe1402d4c4417b3753bb9e0083a7f20c81116f79fb8c88de0f8e11d412ce588.png

The style and size of nodes can also be specified. For example, we can use style to indicate the clade, size to indicate the time, and color to indicate the lineage_group annotation. Conveniently, the legends are automatically placed and can be customized using the legend_kwargs parameter.

py.pl.branches(tdata, depth_key="time")
py.pl.nodes(tdata, color="lineage_group", size="time", style="clade", legend=True, legend_kwargs={"ncols": 2});
../_images/4c4387b3996c460c3c9634b7a74502067310138224e1e01abb4acce415418286.png

Since internal nodes are observed in the C. elegans dataset, we can also color nodes by the expression of genes such as elt-2, which is a transcription factor expressed in the E (Endoderm) lineage. We’ll set nodes = 'all' to plot both the leaf and internal nodes.

pycea.pl.nodes() can be called multiple times to plot different sets of nodes. For example, we can mark the E progenitor cell wth a red star.

py.pl.branches(tdata, depth_key="time")
py.pl.nodes(tdata, color="elt-2", nodes="all")
py.pl.nodes(tdata, color="red", nodes="E", style="*", size=200);
../_images/527d50b54679f2148dfcbb41248e6d7654597c03c2bf942702d440bdc95f9db0.png

Plotting annotations#

We can also add leaf annotations to the plot using pycea.pl.annotation(). Again we will plot the clades but this time a leaf annotation.

py.pl.branches(tdata, depth_key="time")
py.pl.annotation(tdata, keys="clade");
../_images/9eefceee307fd37aa9566f9106a85dfb0bdd3407e89f3cd1497a3dcabe96b736.png

Annotation keys can be any observation attribute in the tdata object and multiple keys can be specified. For example, we can plot the expression of pie-1, hlh-1, elt-2, and unc-101 in the leaves of the tree.

We’ll set width = .1 to make the annotations wider. The width parameter specifies the width the annotation relative to the width of the tree.

py.pl.branches(tdata, depth_key="time")
py.pl.annotation(tdata, keys=["pie-1", "hlh-1", "elt-2", "unc-101"], width=0.1);
../_images/80305fecfefb5a69554089e2622e9db00a55c99039e1b703a75eed8f27397a45.png

Matrices stored in obsm and obsp can also be plotted as annotations. To see how lineage distance relates to transcriptomic distance, we can plot the transcriptomic distance between pairs of cells (PCA space). Clearly, the E clade is transcriptionally distinct from the other clades.

sc.pp.pca(tdata, n_comps=10, key_added="pca")
py.tl.distance(tdata, key="pca")
py.pl.branches(tdata, depth_key="time")
py.pl.annotation(tdata, keys=["pca_distances"], legend=True)
py.pl.annotation(tdata, keys=["clade"], legend=True, width=0.3);
../_images/817dcbaf3aed51a04a16c2805c0e8e0f2cfff81c2b6f92641fce6f11fbc1832f.png

Putting it all together#

Of course, all three of these functions can be layered together. Here we’ll plot branches colored by clade and then annotate nodes and leaves with the umap_cluster label.

py.pl.branches(tdata, depth_key="time", polar=True, color="clade", legend=False, legend_kwargs={"ncols": 2})
py.pl.nodes(tdata, color="umap_cluster", size=20, legend=True, legend_kwargs={"ncols": 2})
py.pl.annotation(tdata, keys=["umap_cluster"], width=0.1, legend=False);
../_images/ea18dec754cd929debe4f0d8169974009e7b60e27c956ace60b3256b32b4b35d.png

For convenience, Pycea also implements pycea.pl.tree() which combines these functions. pycea.pl.tree() is useful for quickly visualizing a tree, but does not provide the same flexibility as plotting components individually.

py.pl.tree(
    tdata,
    depth_key="time",
    polar=True,
    branch_color="clade",
    node_color="umap_cluster",
    keys="umap_cluster",
    annotation_width=0.1,
    legend={"node":True,"branch":False},
    legend_kwargs={"ncols": 2},
);
../_images/57093e965184ba0fc2054064ef6944dcd132f45d458f9e845732e01a93d215bd.png

Learning more#

Checkout Customizing Scanpy plots for more information on customizing plots. Scanpy and Pycea both use the Matplotlib plotting library so plots can be customized in a similar way.

If you have any questions or feedback, please reach out by submitting an issue on GitHub.