Improving the utility of the single-cell pediatric cancer atlas through updated cell type annotations, CNV inference, and visualization tools

Presenter: Ally Hawkins, MS;PhD Session: Pediatric Cancer Genomics and Epigenomics Time: 4/20/2026 2:00:00 PM → 4/20/2026 5:00:00 PM

Authors

Allegra G. Hawkins 1 , Joshua A. Shapiro 1 , Stephanie J. Spielman 1 , David S. Mejia 1 , Deepashree Venkatesh Prasad 1 , Nozomi Ichihara 1 , Arkadii Yakovets 1 , Avrohom M. Gottlieb 1 , Kurt G. Wheeler 2 , Chanté J. Bethell 3 , Steven M. Foltz 4 , Jennifer O’Malley 1 , Casey S. Greene 5 , Jaclyn N. Taroni 1 1 Alex’s Lemonade Stand Foundation, Bala Cynwyd, PA, 2 Reify Health, Boston, MA, 3 The University of Texas MD Anderson Cancer Center, Houston, TX, 4 Multiple Myeloma Research Foundation, Boston, MA, 5 University of Colorado Anschutz Medical Campus, Aurora, CO

Abstract

The Single-cell Pediatric Cancer Atlas (ScPCA) Portal (https://scpca.alexslemonade.org/), developed and maintained by the Childhood Cancer Data Lab, is a data resource for uniformly processed single-cell and single-nuclei RNA sequencing data, as well as de-identified metadata from pediatric tumor samples. Originally comprised of data from 10 projects funded by Alex’s Lemonade Stand Foundation (ALSF), the Portal currently contains summarized gene expression data for over 700 samples across more than 50 cancer types drawn from ALSF-funded and community-contributed datasets. Downloads include gene expression data as SingleCellExperiment or AnnData objects containing raw and normalized counts, PCA and UMAP coordinates, and summary reports. Some samples have additional data from bulk RNA-seq, spatial transcriptomics, and/or feature barcoding (e.g., CITE-seq and cell hashing) included in the download. All data on the Portal were uniformly processed using scpca-nf, an efficient and open-source Nextflow workflow written and maintained by the Data Lab, which utilizes alevin-fry to quantify gene expression. Since presenting the ScPCA Portal at the 2024 AACR Annual Meeting, several new features have been added to the available data. Automated cell type annotation is now performed using three unique methods: SingleR, CellAssign, and SCimilarity. If two of the three methods agree, an ontology-aware consensus cell type label is assigned. The individual annotations and the consensus cell types are included in the cell metadata of the downloaded objects. Some projects also include manually-curated cell type annotations generated as part of the OpenScPCA project (https://openscpca.readthedocs.io). In addition, copy-number variation (CNV) inference is now performed on each sample using the InferCNV package, specifying the i6 HMM to quantify specific CNV events. Since InferCNV quantifies CNV events using a designated set of normal, or non-malignant, reference cells, consensus cell types are used to identify a diagnosis-appropriate normal cell reference for each sample. The total CNVs observed and the full HMM metadata table are stored in the processed SingleCellExperiment and AnnData objects. The updated cell type annotation and implementation of InferCNV are included as part of the open-source workflow, scpca-nf. The workflow and associated documentation are freely available at https://github.com/AlexsLemonade/scpca-nf. Finally, the ScPCA Portal hosts an instance of the UCSC Cell Browser, enabling users to visualize and interact with the gene expression data for all samples without needing to download the data. Comprehensive documentation about data processing and the contents of files on the portal, including a guide to getting started working with an ScPCA dataset, can be found at https://scpca.readthedocs.io.

Disclosure

A. G. Hawkins, None.. J. A. Shapiro, None.. S. J. Spielman, None.. D. S. Mejia, None.. D. Venkatesh Prasad, None.. N. Ichihara, None.. A. Yakovets, None.. A. M. Gottlieb, None.. K. G. Wheeler, None.. C. J. Bethell, None.. S. M. Foltz, None.. J. O’Malley, None.. J. N. Taroni, None.

Cited in


Control: 6651 · Presentation Id: 5986 · Meeting 21436