Predicting TCGA molecular subtypes of gastric cancer from H&E whole-slide images using a weakly supervised transMIL-attention framework
Presenter: Yesul Jeong Session: Digital Pathology 2 Time: 4/20/2026 9:00:00 AM → 4/20/2026 12:00:00 PM
Authors
Yesul Jeong 1 , Dewan M. Bappy 2 , Sangjeong Ahn 3 , Sung Hak Lee 4 1 Department of Hospital Pathology, St. Vincent’s Hospital, College of Medicine, The Catholic University of Korea, Seoul, Korea, Republic of, 2 Department of Computer Science and Engineering, Incheon National University, Incheon, Korea, Republic of, 3 Department of Pathology, Korea University Anam Hospital, Seoul, Korea, Republic of, 4 Department of Hospital Pathology, Seoul St. Mary’s Hospital, Seoul, Korea, Republic of
Abstract
Background: The Cancer Genome Atlas (TCGA) has defined four molecular subtypes of gastric cancer: Epstein-Barr virus (EBV)- associated, microsatellite instability (MSI)- associated, genomically stable (GS), and chromosomal instability (CIN). These subtypes have distinct clinicopathologic and therapeutic implications. However, the routine determination of these subtypes still relies on multimodal molecular assays that are costly and not universally available. We aimed to develop a weakly supervised deep learning framework that predicts TCGA molecular subtypes directly from hematoxylin and eosin (H&E)-stained whole-slide images (WSIs). Methods: As a baseline, we implemented attention-challenging multiple instance learning (ACMIL), which leverages multi-branch attention (MBA) and stochastic instance masking for weakly supervised WSI classification. We then proposed a hybrid multiple instance learning (MIL) model that combines a transformer-based MIL architecture (TransMIL) with an MBA. The features and attention scores from TransMIL-MBA were fused via weighted attention to generate slide-level predictions. The model was trained and evaluated on 484 H&E WSIs at 20× magnification with TCGA molecular subtype labels for gastric cancer from the TCGA ESCA and STAD projects, using an 80%, 10%, and 10% split for training, validation, and testing, respectively. Model performance was assessed using the four-class area under the receiver operating characteristic curve (AUC), accuracy, confusion matrices, and visualization of the latent feature space with UMAP and V-measure. Results: Attention heatmaps indicated that the proposed TransMIL-MBA hybrid model consistently highlighted histologically relevant tumor regions compared to ACMIL. On the TCGA test set, the TransMIL-MBA model outperformed ACMIL in four-class subtype classification (AUC: 0.89 vs. 0.87; accuracy: 0.74 vs. 0.67). The hybrid model also showed improved subtype separability in the UMAP embedding (V-measure 0.74 with TransMIL-MBA vs. 0.65 with ACMIL) and clearer confusion matrices. Conclusions: A weakly supervised multiple instance learning framework combining TransMIL with MBA shows promising performance for predicting TCGA molecular subtypes of gastric cancer directly from H&E WSIs. By providing a scalable, image-based surrogate method for molecular subtyping, this approach may help move clinical care closer to precision medicine, enabling more tailored treatments and potentially improving outcomes for patients with poor-prognosis gastric cancer.
Disclosure
Y. Jeong, None.. D. M. Bappy, None.. S. Ahn, None.. S. Lee, None.
Cited in
Control: 8250 · Presentation Id: 3102 · Meeting 21436