Foundation model

一句話：一個大模型在大量資料上做 self-supervised pretraining，學到通用表徵；之後各應用只用它的 embedding（或 fine-tune）去做 downstream task。ChatGPT 是 language 的 foundation model；pathology 有 pathology-specific 的。

Pathology foundation model 清單（2026）：

模型	開發團隊	資料量	特色
PLIP	Stanford	208k	早期開源，概念驗證
UNI	Mahmood Lab, Harvard	100M tiles	2024 benchmark king
GigaPath	Microsoft	171k WSIs, 4B tiles	最大規模
CONCH	Harvard	1.8M pairs	multi-modal (text+image)
Prov-GigaPath	更新版	以上都包	持續迭代

為什麼你讀 2026 AACR 要懷疑 benchmark：

TCGA leakage 陷阱：UNI / GigaPath 的 pretrain corpus 含 TCGA → 任何用 TCGA test 的 downstream benchmark 都 inflated
Out-of-distribution 評估：foundation model 在同 distribution 測試爆表，但遇到新掃描儀 / 新染色條件 / 新醫院時 drop
Clinical utility ≠ benchmark score：AUROC 0.92 跟 AUROC 0.89 在 foundation model 的世界差距很小，但對臨床實作差別可能很大

2026 結論：UNI 幾乎吃下整個 downstream 生態，但邊界仍未被真正驗證。你若做 urology pathology AI 需要先 replicate vendor-level concordance 測試，而不是用 TCGA 跟自己 repeat。

See in wiki

05-ai-workflow — Pathology foundation model 章節
18-bioinformatics-ml-single-cell
agentic-ai

AAI Internal Wiki

探索

foundation-model

Foundation model

See in wiki

關係圖譜

目錄

反向連結