Bridging histopathology and spatial transcriptomics for comprehensive tumor microenvironment profiling

Presenter: Xiaohan Xing Session: Digital Pathology 1 Time: 4/19/2026 2:00:00 PM → 4/19/2026 5:00:00 PM

Authors

Xiaohan Xing , Lei Xing Stanford University, Palo Alto, CA

Abstract

Purpose: The tumor microenvironment (TME) critically influences cancer progression, treatment response, and patient outcomes. Histopathology reflects morphological features of TME states but lacks molecular specificity, whereas spatial transcriptomics (ST) provides spatial gene expression yet is costly and impractical for large-scale use. To bridge this gap, we develop a multimodal AI framework that transfers spatial molecular information from ST into histopathology-derived representations, enabling reconstruction of molecular programs, TME phenotypes, and spatial biology directly from routine H&E slides. This sequencing-free approach supports biomarker discovery and advances precision oncology through scalable molecular profiling. Methods: We introduce a multi-scale multimodal learning strategy that aligns two large-scale foundation models: UNI (trained on 100,000 pathology slides) [1] and Visiumformer (trained on 3.94 million ST profiles) [2]. Rather than training a unified model from scratch, we integrate the two modalities through multi-scale contrastive alignment. We curated 355 samples from the HEST-1K dataset [3], comprising 801,157 paired H&E patches and ST spots across 16 tissue types. At the patch level, we enforce consistency between paired histology and ST embeddings. At the region level—where each region is defined as a cluster of nine neighboring patches—we further constrain cross-modal agreement. To maintain hierarchical coherence, we additionally encourage alignment between each patch and its corresponding parent region. This multi-scale contrastive alignment effectively transfers spatial molecular knowledge from ST into histopathology-based representations, enhancing various downstream tasks. Results: We evaluated our framework on two downstream tasks. (1) Gene expression status prediction: On the BCNB dataset (n=1,058 WSIs), our model improved ER/PR/HER2 prediction over pretrained UNI. ER AUC/BACC increased from 0.882/0.780 to 0.891/0.771; PR from 0.792/0.712 to 0.812/0.715; and HER2 from 0.662/0.602 to 0.696/0.634. (2) Spatial spot classification: On the DLPFC dataset (n=12 WSIs), linear probing achieved 71.75% balanced accuracy and 78.15% weighted F1, compared with 55.19% and 63.61% for UNI—improvements of 16.56% and 14.54%. Conclusions: Our multi-scale contrastive alignment framework transfers spatial molecular signals from ST into histopathology-derived representations, improving gene expression prediction, mutation inference, and spatial TME characterization. By enabling sequencing-free reconstruction of molecular and microenvironmental features, this approach offers a scalable solution for large-cohort cancer profiling and may facilitate biomarker discovery, patient stratification, and biologically informed precision oncology.

Disclosure

X. Xing, None.

Cited in


Control: 3285 · Presentation Id: 3075 · Meeting 21436