Multi-site development of automated lesion classification for comprehensive tumor burden assessment: Addressing the RECIST trial-practice disconnect
Presenter: Sean Khozin, PhD Session: Radiomics and AI in Medical Imaging Time: 4/20/2026 2:00:00 PM → 4/20/2026 5:00:00 PM
Authors
Ella Pavlechko 1 , Xi Jiang 1 , Ravikumar Komandur Elayavilli 2 , Eleanor McCabe 2 , Jon McDunn 2 , Sean Khozin 3 1 SAS Institure, Cary, NC, 2 Project Data Sphere, Cary, NC, 3 CEO Roundtable on Cancer & Project Data Sphere, Cary, NC
Abstract
Background: Response Evaluation Criteria in Solid Tumors (RECIST) mandate manual selection of 2-5 target lesions with unidimensional measurements, generating inter-reader discordance beyond 30% while folding 3D tumor dynamics into categorical outcomes with limited biological interpretability. RECIST protocols are rarely used in routine practice, creating a trial-practice disconnect that undermines endpoint validity. Automated volumetric quantification of total tumor burden represents an alternative contingent on reliable autonomous model performance across diverse imaging environments. Methods: We developed a modular dual-architecture system combining UNet segmentation with ResNet50 classification, trained on 2,464 CT scans from 1,324 patients across three continents (North/South America & Asia) acquired on heterogeneous scanner platforms (GE, Siemens, Philips, Toshiba). The 11,705-lesion dataset had a clinically representative class distribution: 2,125 malignant (18%), 193 benign (2%), and 9,387 other findings (80%). Preprocessing applied Hounsfield windowing (level: -600, width: 1500) and Lungmask segmentation. To improve classification performance and address class imbalance, we augmented minority-class images using flips, rotations, and sharpening. Results: Segmentation training demonstrated loss reduction of 86% across epochs with parallel convergence in training and validation sets. Classification performance on held-out development data (n=1,332 lesions) yielded 86% accuracy, 91% sensitivity, 17% specificity, 94% positive predictive value (PPV), and 11% negative predictive value at 0.5 probability threshold. Precision-recall area under curve was 0.89. Model performance remained stable across scanner manufacturers without platform-specific recalibration. Conclusions: High sensitivity (91%) with constrained specificity (17%) exhibits successful optimization for malignancy detection in imbalanced datasets. The 94% PPV confirms reliable malignancy identification, while precision-recall AUC of 0.89 supports effective minority class discrimination despite 11:1 imbalance. Stable cross-platform performance allows for multi-site training to be a viable paradigm for generalized deployment. This initial work provides the foundation for lesion-level classification within a fully autonomous system for volumetric tumor burden quantification. Our development path includes classification refinement, expansion to whole-body CT across organ systems, and integration of autonomous detection and segmentation modules. The next steps in establishing TTB as a regulatory-grade endpoint addressing RECIST’s limitations require external confirmation in independent cohorts, prospective clinical-trial evidence, and correlation with clinical outcomes.
Disclosure
E. Pavlechko, None.. X. Jiang, None.. R. Komandur Elayavilli, None.. E. McCabe, None.. J. McDunn, None.. S. Khozin, None.
Cited in
Control: 3240 · Presentation Id: 2617 · Meeting 21436