A healthcare system scale multimodal whole patient temporal foundation model
Presenter: Andrew Zhang, BA Session: Agentic AI in Cancer Time: 4/19/2026 2:00:00 PM → 4/19/2026 5:00:00 PM
Authors
Andrew Zhang 1 , Tong Ding 1 , Sophia J. Wagner 1 , Caiwei Tian 1 , Ming Yang Lu 1 , Alexandre Misrahi 1 , Joshua E. Lewis 1 , Rowland Pettit 1 , Long P. Le 2 , Faisal Mahmood 1 1 Harvard Medical School/Brigham and Women’s Hospital, Boston, MA, 2 Pathology, Harvard Medical School/Massachusetts General Hospital, Boston, MA
Abstract
Healthcare data are fragmented across time and modalities, including clinical reports, imaging, and lab tests. While Electronic Health Records (EHRs) capture rich longitudinal health trajectories, current predictive modeling approaches typically model individual modalities in isolation, missing the context needed to understand complex diseases such as cancers. To bridge this gap, we aim to synthesize the entirety of a patient’s medical history into a unified computable representation. We curated a retrospective cohort from a major U.S. healthcare system, comprising 25 billion medical events from 7.2 million patients spanning 33 years. This dataset integrates 28 distinct clinical modalities, including structured data (diagnoses, medications, vital signs, flowsheet, and laboratory results), clinical notes, and imaging data. We developed a transformer-based multimodal temporal foundation model that tokenizes each modality with modality-specific encoders and fuses events over time into a unified patient embedding. We evaluated frozen patient embeddings on 246 downstream prediction tasks, including new onset of 87 diseases, progression of 56 diseases, treatment response for 100 therapy-outcome pairs, and three short-term operational tasks. Across all tasks, the model achieved a mean AUROC of 0.77, outperforming age-sex, clinical text, and task-specific supervised baselines. On oncology-focused tasks spanning solid and hematologic malignancies and systemic therapies, the model outperformed the age-sex baseline by 9% for new neoplasm onset, 18% for neoplasm progression, and 16% for treatment response. Unsupervised clustering of patient embeddings recovered clinically coherent groupings of cancer types, comorbidities, and treatment patterns, forming a multiscale, data-driven atlas of medical phenotypes. The same embeddings enabled similarity search to identify patients with comparable trajectories, supporting automated cohort discovery and fine-grained clinical trial matching. Gradient-based interpretability analyses identified multimodal risk factors for disease onset and treatment response that aligned with clinical expectations, providing transparent attribution at both patient and population level. A single multimodal, temporally aware EHR foundation model can learn general-purpose whole-patient representations that support accurate early prediction and phenotyping of cancer outcomes while remaining applicable across diverse diseases. By consolidating fragmented data into a continuously updated patient representation, this approach lays the groundwork for shifting oncology from reactive, episodic care to proactive, continuous risk management, and provides a scalable basis for risk stratification, trial optimization, and discovery of clinically interpretable multimodal biomarkers.
Disclosure
A. Zhang, None.. T. Ding, None.. S. J. Wagner, None.. C. Tian, None.. M. Lu, None.. A. Misrahi, None.. J. E. Lewis, None.. R. Pettit, None.. L. P. Le, None.. F. Mahmood, None.
Cited in
Control: 7011 · Presentation Id: 3530 · Meeting 21436