Depth prediction in superficial esophageal cancer using a foundation model from endoscopic images

Presenter: Sehun Kim, PhD Session: Radiomics and AI in Medical Imaging Time: 4/20/2026 2:00:00 PM → 4/20/2026 5:00:00 PM

Authors

Sehun Kim 1 , Sohyung Kim 1 , Hyosoon Yoo 1 , Hyuk Lee 2 , Yang Won Min 2 1 Samsung Precision Genome Medicine Institute, Samsung Medical Center, Seoul, Korea, Republic of, 2 Department of Medicine, Samsung Medical Center, Sungkyunkwan University School of Medicine, Seoul, Korea, Republic of

Abstract

Background: For superficial esophageal cancer (SEC) without a risk of lymph node metastasis (LNM), upfront endoscopic submucosal dissection (ESD) is the preferred treatment over surgery. The representative factor for predicting LNM risk is tumor depth, but the insufficient prediction accuracy during pre-procedure evaluation frequently results in unnecessary surgery or the need for salvage surgery after ESD. Accordingly, we propose to predict SEC tumor depth using a foundation model (FM) applied to pre-procedural endoscopic images. Using a foundation model (GastroFM) that was pretrained on approximately 500,000 EGD images, we propose utilizing its knowledge gained from pre-training to provide a reliable, pre-histopathology prediction of invasion depth from endoscopic images. Methods: We adapted a FM, which is based on a vision transformer architecture and was pretrained on approximately 500,000 upper endoscopic images, for SEC depth prediction. We propose utilizing the knowledge gained from this pre-training to provide a reliable, pre-histopathology prediction of invasion depth from endoscopic images. The model was trained and validated on a retrospective cohort of 839 ESD cases for SEC (April 2007-January 2023) using an 8:1:1 train/validation/test split. For each case, esophageal ESD expert selected tumor-displaying images (median 10, range 4-36) from pre-procedure endoscopy. We applied Attention-based Multiple Instance Learning (AbMIL) to aggregate these variable-length image sequences into a single case-level prediction score for the binary task: mucosal cancer vs submucosal cancer. Results: The overall cohort distribution was 529 cases (63.1%) with mucosal cancer and 310 cases (36.9%) with submucosal cancer. On the test set, a FM achieved an AUC of 0.821 and an overall accuracy of 0.810. Key metrics for the binary classification were: sensitivity 0.645, specificity 0.906, F1 score 0.714, positive predictive value 0.800, and negative predictive value 0.814. Conclusion: We developed a FM with a discriminative power for depth prediction in SEC. This model is expected to be a valuable supplementary tool in the pre-procedure workup, assisting endoscopists and surgeons in determining the optimal treatment plan for SEC.

Disclosure

S. Kim, None.. S. Kim, None.. H. Yoo, None.. H. Lee, None.. Y. Min, None.

Cited in


Control: 8460 · Presentation Id: 2609 · Meeting 21436