Clinical-grade quality control and stain harmonization enhancement in whole-slide images of lung cancer using deep learning

Presenter: Bardia Rodd, PhD Session: Digital Pathology 1 Time: 4/19/2026 2:00:00 PM → 4/19/2026 5:00:00 PM

Authors

Meghdad Sabouri Rad 1 , Mohammad Mehdi Hosseini 1 , Rakesh Choudhary 1 , Harmen Siezen 2 , Saverio J. Carello 1 , Ola El-Zammar 1 , Michel R. Nasr 1 , Bardia Rodd 1 1 SUNY Upstate Medical University, Syracuse, NY, 2 University of Maryland at College Park, College Park, MD

Abstract

Background: Deep learning models trained on hematoxylin-eosin (H&E) whole-slide images (WSIs) increasingly support prognostic and therapeutic-response biomarkers in lung cancer. However, their reliability is often compromised by image blur, scanner and pen artifacts, and inter-slide stain variability that obscure tissue morphology and introduce spurious predictive signals. To mitigate these issues, we developed a comprehensive, quantitative WSI quality-control (QC) and stain-harmonization pipeline tailored to single-pattern lung adenocarcinoma and assessed its downstream impact on clinical outcome prediction. Methods: We analyzed 143 H&E WSIs (20×, 0.5 μm/pixel) from a Dartmouth lung adenocarcinoma cohort. QC incorporated multiple slide-level metrics, including tissue masks excluding artifacts, quantification of local blur using variance of the Laplacian across ≤600 tissue-centered crops, thumbnail-derived brightness statistics, and hematoxylin optical-density medians. Slides were retained only if they satisfied strict thresholds: tissue coverage ≥40%, artifact fraction ≤1%, blurry-tissue fraction ≤5%, brightness 140-210, and hematoxylin median 0.10-0.35. Among QC-cleared slides, an automated procedure selected a cohort-representative reference slide based on proximity to median brightness and hematoxylin metrics; its Macenko stain vectors and 99th-percentile concentration parameters were used for harmonization. All accepted WSIs were normalized to this reference. From normalized slides, 256×256 tiles (stride 256) with tissue fraction ≥70% and artifact fraction ≤2% were extracted to train a slide-level binary outcome model. Results: Of 143 WSIs, 140 (97.9%) passed QC; three were excluded. Retained slides had a median resolution of 33,792 × 46,080 pixels. From 128 QC-passing cases, we obtained 215,220 analysis-ready tiles (median 1,590 per slide). By comparison, a simpler Otsu-based pipeline produced 249,073 tiles; thus, QC-aware masking removed 13.6% of candidates—primarily low-tissue or artifact-laden regions—without diminishing tumor or stromal coverage. Stain-normalized previews demonstrated highly consistent H&E appearance. Classifier performance improved from 90.63% ± 9.36 without QC/harmonization to 95.32% ± 4.05 following implementation. Conclusions: This standardized QC and stain-harmonization framework effectively excludes low-quality slides and artifact-heavy regions while substantially improving the stability and accuracy of deep learning-based clinical outcome prediction in lung adenocarcinoma. By reducing technical confounders and enforcing stain consistency, the workflow enhances the robustness, reproducibility, and translational readiness of WSI-derived biomarkers and supports multi-institutional harmonization efforts in computational pathology.

Disclosure

M. Hosseini, None.. R. Choudhary, None.. H. Siezen, None.. S. J. Carello, None.. O. El-Zammar, None.. B. Rodd, None.

Cited in


Control: 5206 · Presentation Id: 3085 · Meeting 21436