A comprehensive LLM-enabled pharmacodynamic biomarker resource to accelerate cancer drug development
Presenter: Yuntao Yang, PhD Session: Integration of Clinical and Research Data Time: 4/20/2026 2:00:00 PM → 4/20/2026 5:00:00 PM
Authors
Yuntao Yang , Li Zhao , Seyedmehdi Orouji , Ying Zhu , Rebecca Johnson , David Maxwell , Kaitlyn Brickey , Bissan Al-Lazikani Genomic Medicine, UT MD Anderson Cancer Center, Houston, TX
Abstract
Introduction: A major challenge in oncology drug development is confirming target engagement of therapeutics in patient tumors. Pharmacodynamic (PD) biomarkers provide this link, enabling early assessment of target validity, drug activity and rational dose selection. However, systematic resources connecting drug targets to validated PD biomarkers remain limited. To address this, we developed a comprehensive dataset and analytic framework to identify and prioritize target-specific PD biomarkers across nine major target classes implicated in cancer biology. Materials and Methods: We curated biomarker candidates from multiple genomic and pharmacologic resources for nine target classes: transcription factors/cofactors, kinases, phosphatases, ubiquitin ligases, deubiquitinases, acetyltransferases, deacetylases, methyltransferases, and demethylases. Source databases often apply broad definitions to functional classes such as transcription factors. To improve accuracy and reduce annotation artifacts, we cross-referenced PFAM, Enzyme Classification, and PDB databases to refine protein classifications. Using the canSAR interactome, we identified direct target-biomarker interactions supported by experimental evidence. To capture context-specific transcriptional biomarkers, we computed cohort-specific correlations using TCGA, TARGET, and GTEx datasets. Finally, we employ LLM-based fact-checking agent to extract and harmonize antibody annotations from the Antibody Registry, focusing on enzyme targets with measurable substrate modifications. Results: From 2,900 targets and 100,000 interactions, we propose 73,000 high-confidence target-biomarker relationships involving over 2,100 potential drug targets. Of these, 67% represent transcription factor-gene interactions and 33% enzyme-substrate interactions. Commercial antibodies were identified for over 2,800 biomarker candidates, supporting experimental validation. The resulting dataset covers more than 60% of the top 20 predicted targets across 19 cancer types. We provide all the data in our canSAR platform and as a download from canSAR-PD. Discussion: This resource provides a systematic framework for PD biomarker discovery in oncology. By integrating curated molecular interactions with LLM-derived antibody annotations, it enables robust evaluation of target engagement and drug activity. The dataset establishes a foundation for developing PD biomarkers to guide dose selection, monitor response, and accelerate cancer drug development.
Disclosure
Y. Yang, None.. L. Zhao, None.. S. Orouji, None.. Y. Zhu, None.. R. Johnson, None.. D. Maxwell, None.. K. Brickey, None.. B. Al-Lazikani, None.
Cited in
Control: 726 · Presentation Id: 3350 · Meeting 21436