Files
C.E.L_Slide_test2/docs/architecture/IMP-20-FRAME-CONTRACT-VALIDATION-REFERENCE.md

14 KiB
Raw Blame History

IMP-20 — Phase Q content_verifier Frame Contract Validation Pattern Reference

Status: documented (reference-only, dormant) Scope: doc-only. No runtime surface modified. Related issue: #20 Soft dependency: IMP-04 (extended catalog application) — IMP-20 stays dormant; activates only via the A5 gate. Source axis: INSIGHT-MAP §3 / §2.7 H2 — content_verifier.verify_structure pattern reference.


A1 — Phase Q consumer pattern (read-only reference)

Phase Q implements area-level required-pattern validation at the content-verifier layer. References (do not modify):

  • src/content_verifier.py:382-392REQUIRED_PATTERNS: dict[str, list[str]] — top-level pattern dictionary keyed by area name (body_bg, body_core, sidebar, footer). Values verified: body_bg=[], body_core=["key-msg"], sidebar=["padding-left", "text-indent"], footer=[]. Phase T (L379-381 comment) removed the overflow:hidden requirement to reconcile with the Phase T prompt's "overflow:hidden 금지" directive — that no-regression boundary is preserved.
  • src/content_verifier.py:395-448verify_structure(generated_html, area_name, has_image=False, font_hierarchy=None) → VerificationResult — the substring-check + OR + tolerance core logic.
    • :405-412 — substring presence loop. Each pattern string is split on | (pattern.split("|") at L410) and treated as an OR alternation: any alternative present passes the pattern. Missing alternatives are appended to a missing list.
    • :414-416has_image branch. When has_image=True and area_name == "body_core", an additional implicit requirement is enforced: "slide-img-" must appear in generated_html. Missing image marker is reported as "slide-img-* (이미지 태그)" in missing.
    • :418-436font_hierarchy branch. When supplied, area-name → max-font lookup uses a fixed role_font_map = {"body_bg":bg/11, "body_core":core/12, "sidebar":sidebar/10, "footer":core/12}. HTML font-size:\s*(\d+(?:\.\d+)?)\s*px matches are extracted via regex (L430); each measured size > max_font + 1 (1px tolerance at L433) emits a font_warnings entry. Warnings do not flip passed.
    • :438-447 — result construction. passed = (len(missing) == 0). score = 1.0 on pass else 1.0 - len(missing) / max(1, len(patterns)) (continuous degradation; max(1, …) guards empty-pattern division by zero). Errors prefixed "필수 패턴 누락: ". Warnings carry font hierarchy violations only.
  • src/content_verifier.py:455-487verify_area(original_text, generated_html, area_name, has_image=False) → VerificationResult — composes L1 (verify_text_preservation) + L2 (verify_no_forbidden_content) + L3 (verify_structure) at L462-466. verify_structure call at L465 passes has_image but not font_hierarchy (font_hierarchy is unused inside verify_area).
  • src/content_verifier.py:490-529verify_all_areas(generated, area_texts, has_image_areas=None) — area dispatch fan-out. body_html is split into body_bg + body_core (L510-519); body_core is the only branch that propagates has_image=("body_core" in has_image_areas) to verify_area (L518). sidebar_html (L521-525) and footer_html (L527-531) call verify_area with default has_image=False.

Classification: area-level (Phase Q HTML area axis) required-pattern validation at content-verifier time. Not Phase Z frame_id × sub_zone contract validation.

A2 — Phase Q REQUIRED_PATTERNS shape (read-only reference)

The Phase Q pattern-dict shape — values are Phase Q-specific and excluded from reuse; only the shape is Phase Z design input.

Axis Phase Q shape Where observed
Key axis area name (string) src/content_verifier.py:382 keys: body_bg / body_core / sidebar / footer
Value type list[str] of substring patterns src/content_verifier.py:383-391
Alternation semantics "a|b" → OR (any alt passes) via `pattern.split(" ")`
Image-conditional branch has_image=Truearea_name=="body_core" → implicit "slide-img-" requirement src/content_verifier.py:414-416
Font hierarchy tolerance 1px (fs > max_font + 1); area-name → max-font fixed lookup src/content_verifier.py:433, :421-426
Pass/score rule passed = (missing == []); score = continuous degradation 1.0 - len(missing)/max(1, len(patterns)) src/content_verifier.py:438, :445
Empty-pattern handling max(1, len(patterns)) guards divide-by-zero; empty pattern list always passes src/content_verifier.py:445, :382-383 (body_bg=[])

Shape-only carry-over candidates for Phase Z design (see A3 in u2):

  • dict[key]→list[pattern] indirection.
  • OR via in-string | separator (low-ceremony alternation).
  • Conditional implicit requirement injected by external context flag (here has_image; in Phase Z potentially accepted_content_types per sub_zone).
  • Continuous score degradation rather than binary pass/fail (downstream consumers can threshold).
  • Separate errors (block) vs warnings (advisory) lanes — font hierarchy lives in warnings, not errors.

Values that must not carry into Phase Z: the literal strings "key-msg", "padding-left", "text-indent", "slide-img-", and the area names body_bg / body_core / sidebar / footer themselves — these are Phase Q area-HTML idioms, not Phase Z frame/slot idioms.

A3 — Phase Z target pattern dict (design input, not yet active)

The Phase Z-native target axis = frame_id × sub_zone pattern dict, aligned with templates/phase_z2/catalog/frame_contracts.yaml. References (do not modify):

  • templates/phase_z2/catalog/frame_contracts.yaml:21 three_parallel_requirements (F13, 3 sub_zones), :77 process_product_two_way (F29, 2 sub_zones × strict 3 cardinality), :128 bim_issues_quadrant_four (F16, 4 sub_zones), :189 three_persona_benefits (F14, 3 sub_zones), :253 construction_goals_three_circle_intersection (F12, 3+1 sub_zones — intersection is min:0,max:1), :323 construction_bim_three_usage (F11, 3 sub_zones), :391 bim_dx_comparison_table (F18, 2 header + 1 rows with min:1,max:12), :456 dx_sw_necessity_three_perspectives (F20, 3 sub_zones), :520 info_management_what_how_when (F8, 3 sub_zones), :580 sw_reality_three_emphasis (F28, 3 sub_zones), :637 bim_current_problems_paired (F17, 8 sub_zones — row × side 2-axis).
  • All 11 contracts carry accepted_content_types + sub_zones; field density_envelope is absent across the catalog (verified grep -c "density_envelope" templates/phase_z2/catalog/frame_contracts.yaml = 0).
  • src/phase_z2_mapper.py:49-57 load_frame_contracts / get_contract — direct dict lookup against the 11 entries above.
  • src/phase_z2_pipeline.py:3776-3805 Step 10 emit — currently surfaces frame_id / family / source_shape / cardinality / visual_hints / accepted_content_types / sub_zones / payload_builder / payload_builder_options to step10_frame_contract.json with step_status="partial". No pattern-dict assertion runs against this payload yet.

Abstraction-mismatch table (Phase Q area-level vs Phase Z frame/slot-level):

Axis Phase Q (A1+A2) Phase Z target (A3)
Key area name (body_bg/body_core/sidebar/footer) (frame_id, sub_zone_id) tuple — e.g. (1171281190, "pillar_1")
Cardinality of keys 4 fixed area names open over 11 contracts × N sub_zones (3+2+4+3+4+3+3+3+3+3+8 = 39 sub_zones in current catalog)
Value semantics substring presence (HTML-string match) candidates: substring presence and/or contract-field assertion (cardinality.strict / accepts membership / partial_target_path resolution)
Conditional branch input has_image external flag accepted_content_types per sub_zone (catalog-driven, not external flag)
Tolerance 1px on font-size (single axis) candidates: font-size 1px tolerance carried over or replaced by visual_hints.min_height_px envelope check
Validation timing post-render HTML (generated_html string) post Step 18 final.html (mirrors Phase Q timing) — Step 12 light_edit/restructure proposal is excluded (proposal is upstream of render)
Result lanes errors (block) + warnings (advisory) preserved as-is from Phase Q shape (continuous score; separate font-hierarchy warnings)

Classification: Phase Q area axis ⇄ Phase Z frame/slot axis are not drop-in compatible. The shape (dict indirection + OR alternation + tolerance + conditional implicit-requirement + continuous score) is the only portable element; every value (key strings, area names, literal patterns) is Phase Q-local.

IMP-20 is soft link: IMP-04 per the backlog (docs/architecture/PHASE-Z-IMPLEMENTATION-ISSUE-BACKLOG.md:71). Ownership separation:

  • IMP-04 owns: every frame_contracts.yaml entry — addition / removal / accepted_content_types change / sub_zones schema change / cardinality change / visual_hints change. templates/phase_z2/catalog/frame_contracts.yaml is the IMP-04 source of truth.
  • IMP-20 owns: reference-only documentation of the Phase Q pattern-dict shape (A1 + A2) and the Phase Z target axis design narrative (A3). No catalog edits, no Step 10 promotion.
  • Coupling direction: one-way read. A Phase Z pattern dict (if/when activated through the A5 gate) consumes frame_contracts.yaml as input. It does not publish back into the catalog. IMP-04 is unaware of IMP-20.
  • No bidirectional code flow: IMP-20 does not move Phase Q content_verifier.py code into Phase Z, and IMP-04 does not consume REQUIRED_PATTERNS. The two surfaces remain isolated.
  • Reference direction is one-way: this document points read-only at src/content_verifier.py, src/phase_z2_mapper.py, src/phase_z2_pipeline.py, and templates/phase_z2/catalog/frame_contracts.yaml. No reverse pointer is required in those source files.

If IMP-04 alters the catalog schema (e.g. adds density_envelope or renames sub_zones), A3 must be re-verified (key axis and conditional-branch row in particular). The boundary statement itself does not change.

A5 — Re-activation gate + guardrails

IMP-20 is documented (dormant). Re-activation requires all of the following gate conditions (3-cond AND):

  1. Trigger: Phase Z Step 10 produces a verifiable case where the partial frame-contract emit alone is insufficient — i.e., a final.html regression that a frame_id × sub_zone pattern dict would have caught (missing slot marker, contract field violation, font-hierarchy breach against a sub_zone-resolved max). The trigger must be a regression that maps cleanly to the frame/slot axis, not to a higher layer (composition planning, content adapter, render-time CSS).
  2. Evidence requirement: failing-case MDX + step10_frame_contract.json trace + final.html excerpt with the slot path that should have asserted, attached to a new issue or this issue's reopened state.
  3. IMP-04 sign-off: the IMP-04 owner confirms the failing case is not addressable inside the catalog (e.g. tightening cardinality or accepted_content_types does not resolve it) — only then is a Phase Z-native pattern dict justified.

Design questions resolved in this document (revisit if the gate fires):

  • Q1 — Key granularity: (frame_id, sub_zone_id). Frame-only granularity is insufficient because contracts with sub_zones of differing accepts (e.g. F29 process_column accepts [text_block, transform_table] vs product_column accepts [text_block]) require slot-level differentiation.
  • Q2 — Value type: hybrid — substring patterns (Phase Q parity) plus contract-field assertions (cardinality.strict / accepts membership / partial_target_path resolved in DOM) plus numeric tolerance (carried from font-hierarchy 1px). Three lanes preserved separately so each can fail/pass independently.
  • Q3 — Validation timing: post Step 18 final.html only. Step 12 light_edit/restructure proposal is upstream of render and exposes no HTML for substring assertion; running the dict there would either fire false negatives (no DOM yet) or duplicate Step 18 work.
  • Q4 — Font-hierarchy carry-over: replaced — Phase Q's role_font_map fixed dict (area → max-font) is Phase Q-local. The Phase Z equivalent reads from frame_contracts.yaml visual_hints (min_height_px already present; a future max_font_px field would live in visual_hints and is IMP-04-owned). 1px tolerance shape is portable; the lookup source is replaced.

Guardrails (preserved from Stage 1 + Stage 2):

  • GR1 — Shape-only reference: no Phase Q REQUIRED_PATTERNS value ("key-msg", "padding-left", "text-indent", "slide-img-") or area name (body_bg/body_core/sidebar/footer) may appear in any Phase Z pattern dict activation.
  • GR2 — Phase Q no-regression: src/content_verifier.py:382-392 REQUIRED_PATTERNS is no-touch. The Phase T L379-381 comment (overflow:hidden removed) remains the no-regression boundary; any Phase Z dict design must not re-introduce removed patterns into Phase Q's surface.
  • GR3 — Phase Z dict is Phase Z-owned: no import of content_verifier.REQUIRED_PATTERNS from Phase Z code. The two pattern dicts coexist without symbol sharing.
  • GR4 — IMP-04 soft-link one-way: per § A4. Activating IMP-20 must not block on or modify IMP-04; the catalog is read-only input.
  • PZ-1 — AI isolation contract: pattern dict is code/spec, not AI-generated content. No Kei rewrite, no LLM proposal of pattern values (feedback_ai_isolation_contract).
  • RULE 13 — Anchor sync: any future activation must update backlog (PHASE-Z-IMPLEMENTATION-ISSUE-BACKLOG.md), status board (PHASE-Z-PIPELINE-STATUS-BOARD.md), and INSIGHT-MAP (PHASE-Q-INSIGHT-TO-22STEP-MAP.md) in the same commit.

If IMP-04 alters the catalog schema or src/content_verifier.py is rewritten upstream, A1A3 must be re-verified (file:line refs); the A5 gate itself does not change.