SPIRIT-DEFINE and CONSORT-DEFINE: Raising the Quality Bar for FIH Trial Protocols and Reports

SPIRIT-DEFINE / CONSORT-DEFINE:把 FIH 論文閱讀從「看結果」提升到「審設計品質」

English

In medicine, the quality of a conclusion is only as good as the quality of the process that generated it. In oncology dose-finding trials, the process is the protocol — the document written before the first patient is enrolled, specifying exactly how decisions will be made. For decades, early-phase oncology protocols have been criticized for incompleteness: vague starting dose rationale, missing escalation rules, undefined DLT windows, and undescribed stopping criteria. SPIRIT-DEFINE and CONSORT-DEFINE are the systematic answer to this problem.

The DEFINE project was led by the Institute of Cancer Research (ICR) and represents a multi-stakeholder effort to extend the existing SPIRIT (Standard Protocol Items: Recommendations for Interventional Trials) and CONSORT (Consolidated Standards of Reporting Trials) guidelines specifically to early phase dose-finding trials (EPDF). The SPIRIT extension applies to protocols; the CONSORT extension applies to published trial reports. Both extensions were published as guidance documents in BMJ in 2023, with comprehensive explanation and elaboration (E&E) papers in eClinicalMedicine in 2025. The SPIRIT-DEFINE extension adds 17 new items and modifies 15 existing SPIRIT items; CONSORT-DEFINE adds 21 new items and modifies 19 existing items.

Why does this matter clinically? Consider what happens when a protocol fails to specify the DLT observation window precisely. If the window is set to 21 days for a drug with immune-related toxicities that typically manifest at 6–8 weeks, the DLT criteria are structurally incapable of capturing the drug’s signature adverse events. The trial will report a clean safety profile not because the drug is safe, but because the measurement instrument missed the signal. The RP2D selected on that basis enters phase 2 carrying unexamined risk. SPIRIT-DEFINE requires protocols to specify the DLT window explicitly and justify it against the drug’s known mechanism of action and pharmacokinetic profile.

What SPIRIT-DEFINE requires of a protocol. A complete SPIRIT-DEFINE protocol should pre-specify: (1) trial aims — dose-finding, dose-optimization, or both; (2) the target toxicity level (the DLT rate considered acceptable); (3) all dose levels to be tested; (4) the starting dose rationale with explicit reference to nonclinical or prior clinical evidence; (5) the escalation and de-escalation algorithm with precise rules for dose skipping, cohort expansion, and intra-patient escalation; (6) dose modification rules for continuing treatment below full dose; (7) interim decision-making structure — who makes the decisions, under what criteria, and how conflicts are resolved; (8) safety stopping rules; (9) the criteria for selecting the recommended dose; and (10) the PK, PD, and biomarker sampling plan. None of these are bureaucratic extras; each directly protects the patients enrolled in the trial and the validity of the conclusions drawn from it.

What CONSORT-DEFINE requires of a report. A complete CONSORT-DEFINE report should include: patient flow by dose level (not just an aggregate CONSORT diagram), DLT events and adverse events reported per dose level (not just overall), the number and reasons for dose interruptions and reductions per dose level, PK exposure data presented by dose, pharmacodynamic or target engagement data where available, early activity signals stratified by dose and biomarker subgroup, and a transparent rationale for the final recommended dose that explains why lower and higher doses were not selected. The recurring theme is dose-level granularity: the reader must be able to see how each dose level performed, not just the aggregate outcome across all patients at all doses combined.

The practical teaching application. The most useful classroom exercise built from DEFINE is a two-column worksheet: the left column lists what a well-designed protocol should pre-specify (SPIRIT-DEFINE checklist), and the right column lists what a well-written paper should report (CONSORT-DEFINE checklist). Students are given a published FIH paper and asked to fill both columns simultaneously — marking items as “clearly addressed,” “partially addressed,” or “missing.” This single exercise reveals, quickly and viscerally, how often the reader is asked to trust a RP2D recommendation without the information needed to independently evaluate it.

A one-sentence evaluation that DEFINE makes possible and that should be a standard journal club closing statement: “This trial’s recommended dose was protocol-driven and data-integrated / mostly convention-driven / retrospectively justified.” The distinction matters because only a protocol-driven, data-integrated RP2D carries the evidentiary weight needed to anchor a phase 2 design.

中文

在醫學中,一個結論的品質,不超過生成它的過程的品質。在腫瘤劑量探索試驗中,那個過程就是 protocol——在第一位病人入組之前寫下的文件,精確說明決策將如何做出。幾十年來,早期腫瘤臨床試驗的 protocol 常因不完整而受批評:模糊的起始劑量理由、缺失的升量規則、未定義的 DLT 觀察窗口,以及未描述的停止標準。SPIRIT-DEFINE 和 CONSORT-DEFINE 是對這個問題的系統性回應。

DEFINE 計畫由英國癌症研究院(ICR)主導,代表多利益相關方共同將現有的 SPIRIT(標準 protocol 項目指引)和 CONSORT(臨床試驗報告統一標準)指引,延伸至早期劑量探索試驗(EPDF)的特定需求。SPIRIT 延伸適用於 protocol;CONSORT 延伸適用於已發表的試驗報告。兩個延伸指引於 2023 年在 BMJ 發表為指引文件,並於 2025 年在 eClinicalMedicine 發表完整的解釋與闡述(E&E)文件。SPIRIT-DEFINE 延伸新增 17 項、修改 15 項既有 SPIRIT 項目;CONSORT-DEFINE 新增 21 項、修改 19 項既有 CONSORT 項目。

這在臨床上為何重要? 想像一個 protocol 沒有精確說明 DLT 觀察窗口時會發生什麼。如果窗口設為 21 天,而藥物的免疫相關毒性通常在 6–8 週才顯現,DLT 標準在結構上就無法捕捉這個藥物的代表性不良事件。試驗將報告乾淨的安全性,不是因為藥物安全,而是因為測量工具錯過了訊號。在此基礎上選出的 RP2D,將帶著未被檢驗的風險進入第二期。SPIRIT-DEFINE 要求 protocol 明確說明 DLT 觀察窗口,並依據藥物已知的作用機轉和藥物動力學特性說明理由。

SPIRIT-DEFINE 對 protocol 的要求。 一份完整的 SPIRIT-DEFINE protocol 應預先說明:(1) 試驗目標——劑量探索、劑量最佳化,或兩者;(2) 目標毒性率(可接受的 DLT 率);(3) 所有待測劑量層;(4) 起始劑量理由,明確引用非臨床或先前臨床證據;(5) 升量與降量演算法,包含劑量跳階、隊列擴增和病人內升量的精確規則;(6) 在完整劑量以下繼續治療的劑量調整規則;(7) 期中決策架構——誰做決定、依據什麼標準、衝突如何解決;(8) 安全性停試規則;(9) 建議劑量選擇標準;以及 (10) PK、PD 和生物標記採樣計畫。這些都不是行政附加項目;每一項都直接保護試驗中的病人,以及由此得出的結論的有效性。

CONSORT-DEFINE 對 report 的要求。 一份完整的 CONSORT-DEFINE 報告應包含:依劑量層的病人流程(不只是整體 CONSORT 圖)、依劑量層報告的 DLT 事件和不良事件(不只是整體)、每個劑量層的劑量中斷和減少次數及原因、依劑量呈現的 PK 暴露資料、藥效動力學或靶點接合資料(如有)、依劑量層和生物標記次族群分層的早期療效訊號,以及最終建議劑量的透明理由——說明為何未選較低或較高劑量。重複出現的主題是劑量層粒度:讀者必須能看到每個劑量層的表現,而不只是所有病人、所有劑量的混合結果。

實際教學應用。 從 DEFINE 建構的最有用課堂練習,是一份雙欄工作單:左欄列出設計良好的 protocol 應預先說明的內容(SPIRIT-DEFINE 清單),右欄列出書寫良好的 paper 應報告的內容(CONSORT-DEFINE 清單)。學員拿到一篇已發表的 FIH 論文,被要求同時填寫兩欄——標記為「清楚說明」、「部分說明」或「缺失」。這個單一練習能快速且直觀地揭示,讀者多常被要求在缺乏獨立評估所需資訊的情況下,信任一個 RP2D 建議。

DEFINE 使成為可能、且應成為 journal club 收尾標準語句的一句評語:「這個試驗的建議劑量是由 protocol 驅動並由資料整合支持的 / 主要由慣例驅動的 / 事後合理化的。」這個區分很重要,因為只有由 protocol 驅動、由資料整合的 RP2D,才具有足以支撐第二期設計的證據分量。

Key Concepts | 核心概念

工具適用對象核心貢獻
SPIRIT-DEFINEFIH trial protocol讓 protocol 預先說清楚所有決策規則,保護病人安全
CONSORT-DEFINEFIH trial report / paper讓 paper 完整呈現每劑量層的資料,支持讀者獨立評估
DLT window justification兩者皆適用窗口須與藥物機轉和半衰期相符,不能套用通用 28 天
Dose-level granularityCONSORT-DEFINE 強調依劑量層分層報告,不混合整體統計
RP2D selection criteriaSPIRIT-DEFINE 強調預先定義哪些條件決定 RP2D,不允許事後選擇

Protocol 欄(SPIRIT-DEFINE)

  • 起始劑量理由(引用 NOAEL/HNSTD/MABEL/PAD)
  • 所有劑量層與時程
  • DLT 定義與觀察窗口
  • 升量/降量演算法
  • 誰做期中決定、依據什麼
  • 停試安全規則
  • RP2D/RP3D 選擇標準
  • PK/PD 與生物標記採樣計畫

Report 欄(CONSORT-DEFINE)

  • 每劑量層的病人數量與流程
  • 每劑量層的 DLT 與不良事件
  • 劑量中斷/減少/停藥的次數與原因
  • 依劑量的 PK 暴露
  • PD/靶點接合資料
  • 依劑量與生物標記次族群分層的療效訊號
  • 最終建議劑量的明確理由