Evaluation of large language models for automated clinical trial matching in oncology

Presenter: Aakash Desai, MD Session: Large Language Models in the Clinic Time: 4/20/2026 2:00:00 PM → 4/20/2026 5:00:00 PM

Authors

Aakash Desai , Ellen McNeeley , Sanad Alhuski , Maya Khalil , Matthew Might , Rebecca Arend , Andrew Crouse , Mehmet Akce University of Alabama at Birmingham, Birmingham, AL

Abstract

Background: Efficient patient-trial matching remains a critical challenge in oncology, complicated by heterogeneous documentation, missing data, and complex eligibility criteria. Large Language Models (LLMs) offer potential to automate eligibility screening by interpreting unstructured clinical notes and biomarker data. Methods: We evaluated 6 models: llama3.2:3b, llama3.3:70b, medgemma_27b_text_it, deepseek-r1:8b, gpt-oss20b and gpt-oss120b for clinical trial eligibility determination across 19 key questions reflecting common eligibility criteria from oncology clinical trials. Data were extracted from patient medical records with known trial matches, and models’ binary (yes/no) responses, confidence scores, and reasoning excerpts were analyzed. Concordance between models and interpretability of outputs were assessed. Results: Both gpt-oss20b and gpt-oss120b models demonstrated high agreement on eligibility determinations for well-documented criteria such as measurable disease, ECOG status, age, and tissue availability, with confidence scores commonly above 0.90. Differences emerged in criteria requiring inference or where documentation was incomplete; gpt-oss120b showed greater confidence and nuanced reasoning in ambiguous cases. Both models flagged missing or unclear data, providing reasoning transparency that supports clinical review. Concordance metrics suggested strong reliability (Cohen’s kappa >0.8) for explicit criteria, with potential to significantly reduce manual screening burden. The remaining models provided poorer quality responses in general and were unable to respond coherently at all if required to provide that response in a structured format. Conclusions: LLMs can accurately and transparently automate critical components of oncology trial eligibility screening, augmenting manual review processes. Differences in model confidence with uncertain data underscore the need for ongoing refinement and highlight the value of explainable AI in clinical decision support. These findings support integrating LLMs into clinical trial matching workflows to improve trial access and enrollment efficiency. Impact: Automated, interpretable LLM-based clinical trial matching represents a promising advancement toward precision oncology by scaling patient access to tailored therapies and optimizing trial throughput.

Disclosure

A. Desai, None.. E. McNeeley, None.. S. Alhuski, None.. M. Khalil, None.. M. Might, None.. R. Arend, None.. A. Crouse, None.. M. Akce, None.

Cited in

05-ai-workflow

Control: 6469 · Presentation Id: 2503 · Meeting 21436

AAI Internal Wiki

探索

AB#6469 · Evaluation of large language models for automated clinical trial matching in oncology

Evaluation of large language models for automated clinical trial matching in oncology

Authors

Abstract

Disclosure

Cited in

關係圖譜

目錄

反向連結