Evaluation of an agentic LLM chatbot for clinico-genomic analysis of AACR GENIE BPC data

Presenter: Likhita Sree Thiriveedi, No Degree Session: AACR Project GENIE: Predictive Models and AI Time: 4/19/2026 2:00:00 PM → 4/19/2026 5:00:00 PM

Authors

Likhita Thiriveedi , Kenneth L. Kehl Dana-Farber Cancer Institute, Boston, MA

Abstract

Purpose The significant volume and complexity of genomic and clinical data can hinder efficient research based on clinico-genomic datasets, requiring manual effort and specialized expertise. Agentic large language model (LLM) workflows may help accelerate data processing, but the performance of existing LLMs for this task is not well-characterized. Methods An agentic large language model-based chatbot was developed to leverage the Gemini-2.5-pro LLM to interpret oncology research queries and autonomously execute sequential analytic tasks based on the AACR GENIE BPC NSCLC cohort (version 2.0 public). The LLM’s performance was assessed against a curated benchmark set of 125 expert-reviewed clinical and genomic questions derived from a published study (https://pubmed.ncbi.nlm.nih.gov/37223888/), with accuracy defined as numerical concordance within ±10% of manuscript-reported reference values. Results The chatbot was used to ask 118 questions manually extracted from the publication, including questions broadly categorized as quantifying cohort sizes (n=92) or conducting statistical analyses (n=26). The overall accuracy rate was 42.37%. Inaccurate responses were manually reviewed and assigned to the following categories: no obvious source of error or discrepancy (33.8%), where 39.1% of these deviated Conclusion Agentic LLM data analysis workflows hold potential for automating components of oncology data interpretation, but current performance limitations, attributable to inconsistent reasoning, incomplete clarification of clinical concepts, and a need for clear specification of published analysis plans for reproducibility and evaluation, highlight the need for further model refinement in these specific areas before these systems can be reliably integrated into real-world clinical research pipelines.

Disclosure

L. Thiriveedi, None.. K. L. Kehl, None.

Cited in


Control: 3216 · Presentation Id: 3391 · Meeting 21436