Architecture review reveals claim decomposition as primary verification bottleneck

Source type: obs · Harvested: 2026-05-04 · Original date: 2026-05-03T16:32:19.766Z Metadata: {"project":"lunhsiangyuan","type":"discovery","obs_id":65126}

obs/65126 · discovery · 2026-05-03T16:32:19.766Z

Architecture review reveals claim decomposition as primary verification bottleneck

External architecture review (likely from GPT-5.5 via cursor agent) analyzed the oncology FIH verification pipeline and identified claim decomposition as the critical bottleneck, not SQL quality or tolerance thresholds. Review emphasized that regex-only classification cannot handle semantic complexity in investment memo claims where single sentences embed multiple verifiable units with context dependencies. Recommended hybrid classification where regex extracts surface forms but LLM/structured parser performs claim decomposition into JSON schema with multi-label types and confidence scores. Highlighted that pre-written SQL templates are correct approach but need expansion into metric contract library with explicit cohort/denominator semantics. Stressed importance of snapshot-aware verdicts that distinguish factual disagreement from source drift or definition mismatch. Major blind spots identified: semantic unit reconstruction across tables/captions/footnotes, entity resolution as foundation layer, source hierarchy undefined, negative/exclusivity claims requiring completeness assumptions, and judgment claims requiring factual premise lineage.

Concepts: [“why-it-exists”,“gotcha”,“trade-off”,“problem-solution”]

Facts: [“Single memo claim often contains cohort, time window, metric definition, source boundary, alias mapping, and implicit denominator semantics”,“Regex patterns miss compound claims, alias/partnership relationships, temporal ambiguity, and table context dependencies”,“LLM-generated SQL creates cohort-level semantic errors despite syntactic correctness (trial-level vs program-level mixing, intervention alias issues)”,“15 SQL templates insufficient without metric contract library defining metric, entity, cohort, time_window, dedup_key, filters, and caveats”,“Tolerance policy should be stratified: exact for DB-derived claims, source-drift-aware for external counts, count-based for small denominators”,“Entity resolution is verification foundation: drug aliases, sponsor subsidiaries, site canonicalization, PI name variants directly affect verdicts”,“Verdict taxonomy should expand beyond AGREE/DISAGREE to include SOURCE_DRIFT, DEFINITION_MISMATCH, INSUFFICIENT_ENTITY_RESOLUTION”]

[← 回 Alfred Brain Hub]

AAI Internal Wiki

探索

obs-65126 · obs

Architecture review reveals claim decomposition as primary verification bottleneck

obs/65126 · discovery · 2026-05-03T16:32:19.766Z

關係圖譜

目錄

反向連結