Reasoning‑guided retrieval improves oncology trial eligibility matching from clinical notes

Presenter: Patrycja Krawczuk Session: Agentic AI in Cancer Time: 4/19/2026 2:00:00 PM → 4/19/2026 5:00:00 PM

Authors

Vivek Shetye , Patrycja Nikol Krawczuk , Ryan Godart , Arpita Saha , Gabriel Altay , Chelsea Osterman , Gena Rangel , Samantha Garrett , Victoria L. Chiou , Kunal Nagpal Tempus AI, Inc., Chicago, IL

Abstract

Purpose: Under-enrollment in oncology trials is exacerbated by the manual effort required to screen unstructured clinical records against complex eligibility criteria. Large language models (LLMs) augmented with retrieval mechanisms show promise for automating the process but often fail to capture dispersed evidence required for complex, multi-criterion eligibility decisions. We hypothesized that an agentic retrieval strategy, which empowers an LLM to autonomously and iteratively search, synthesize, and validate relevant information, would enhance evidence completeness and accuracy in eligibility classification. Methods: Evaluation encompassed two task types: (i) complex eligibility reasoning and (ii) biomarker extraction. The complex tasks drew on two non-small cell lung cancer (NSCLC) studies with intricate eligibility criteria: stage III unresectable NSCLC and metastatic NSCLC. Each query represented a patient-level eligibility assessment (total n=618 queries) that required integrating multiple evidence types, including molecular findings, prior therapies, and clinical status. The biomarker task included 148 evaluations targeting key genomic alterations (EGFR, ESR1, RAS).We compared a retrieval-augmented generation (RAG) system that retrieved fixed, similarity-based text chunks with an agentic retrieval approach performing up to 8 adaptive searches. The agent autonomously assessed retrieved evidence, reformulated its queries, and iteratively expanded the search scope to assemble a comprehensive context. Both systems, using LLM (Gemini 2.5 Pro), reviewed up to 64 text chunks per query. Model outputs were benchmarked against expert-curated ground truth using F1-score, recall, and accuracy. Results: Agentic retrieval improved performance across complex eligibility reasoning tasks. In the stage III NSCLC, accuracy rose from 68% to 80% (+12.7 pp; 95% CI: 7.9-17.7), driven by a 15% gain in recall (69% to 84%) and a 9% increase in F1-score from 79% to 88%. In metastatic NSCLC, accuracy improved from 77% to 84% (+6.3 pp; 95% CI: 1.3-11.6) and the F1-score from 86% to 90%. In contrast, the biomarker extraction task showed comparable performance (accuracy 95-96%, F1 96%) for both methods, indicating minimal benefit from agentic reasoning where retrieval complexity was low. Conclusions: This study demonstrates the translational value of agentic retrieval for precise, scalable oncology trial screening. By enabling adaptive evidence synthesis rather than static retrieval, the system improves the completeness of eligibility assessment and minimizes manual review effort.

Disclosure

V. Shetye, Tempus AI, Inc. Tempus AI Consultant. P. N. Krawczuk, Tempus AI, Inc. Employment, Stock. R. Godart, Tempus AI, Inc. Employment, Stock. A. Saha, Tempus AI, Inc. Employment, Stock. G. Altay, Tempus AI, Inc. Employment, Stock. C. Osterman, Tempus AI, Inc. Employment, Stock. G. Rangel, Tempus AI, Inc. Employment, Stock. S. Garrett, Tempus AI, Inc. Employment, Stock. V. L. Chiou, Tempus AI, Inc. Employment, Stock. K. Nagpal, Tempus AI, Inc. Employment, Stock.

Cited in


Control: 2500 · Presentation Id: 2498 · Meeting 21436