# IMP-20 — Phase Q `content_verifier` Frame Contract Validation Pattern Reference **Status**: documented (reference-only, dormant) **Scope**: doc-only. No runtime surface modified. **Related issue**: https://gitea.hmac.kr/Kyeongmin/C.E.L_Slide_test2/issues/20 **Soft dependency**: IMP-04 (extended catalog application) — IMP-20 stays dormant; activates only via the A5 gate. **Source axis**: INSIGHT-MAP §3 / §2.7 H2 — `content_verifier.verify_structure` pattern reference. --- ## A1 — Phase Q consumer pattern (read-only reference) Phase Q implements area-level required-pattern validation at the content-verifier layer. References (do **not** modify): - `src/content_verifier.py:382-392` — `REQUIRED_PATTERNS: dict[str, list[str]]` — top-level pattern dictionary keyed by area name (`body_bg`, `body_core`, `sidebar`, `footer`). Values verified: `body_bg=[]`, `body_core=["key-msg"]`, `sidebar=["padding-left", "text-indent"]`, `footer=[]`. Phase T (`L379-381` comment) removed the `overflow:hidden` requirement to reconcile with the Phase T prompt's "overflow:hidden 금지" directive — that no-regression boundary is preserved. - `src/content_verifier.py:395-448` — `verify_structure(generated_html, area_name, has_image=False, font_hierarchy=None) → VerificationResult` — the substring-check + OR + tolerance core logic. - `:405-412` — substring presence loop. Each pattern string is split on `|` (`pattern.split("|")` at L410) and treated as an OR alternation: any alternative present passes the pattern. Missing alternatives are appended to a `missing` list. - `:414-416` — `has_image` branch. When `has_image=True` and `area_name == "body_core"`, an additional implicit requirement is enforced: `"slide-img-"` must appear in `generated_html`. Missing image marker is reported as `"slide-img-* (이미지 태그)"` in `missing`. - `:418-436` — `font_hierarchy` branch. When supplied, area-name → max-font lookup uses a fixed `role_font_map = {"body_bg":bg/11, "body_core":core/12, "sidebar":sidebar/10, "footer":core/12}`. HTML `font-size:\s*(\d+(?:\.\d+)?)\s*px` matches are extracted via regex (L430); each measured size > `max_font + 1` (1px tolerance at L433) emits a `font_warnings` entry. Warnings do **not** flip `passed`. - `:438-447` — result construction. `passed = (len(missing) == 0)`. `score = 1.0` on pass else `1.0 - len(missing) / max(1, len(patterns))` (continuous degradation; `max(1, …)` guards empty-pattern division by zero). Errors prefixed `"필수 패턴 누락: "`. Warnings carry font hierarchy violations only. - `src/content_verifier.py:455-487` — `verify_area(original_text, generated_html, area_name, has_image=False) → VerificationResult` — composes L1 (`verify_text_preservation`) + L2 (`verify_no_forbidden_content`) + L3 (`verify_structure`) at L462-466. `verify_structure` call at L465 passes `has_image` but **not** `font_hierarchy` (font_hierarchy is unused inside `verify_area`). - `src/content_verifier.py:490-529` — `verify_all_areas(generated, area_texts, has_image_areas=None)` — area dispatch fan-out. `body_html` is split into `body_bg` + `body_core` (L510-519); `body_core` is the **only** branch that propagates `has_image=("body_core" in has_image_areas)` to `verify_area` (L518). `sidebar_html` (L521-525) and `footer_html` (L527-531) call `verify_area` with default `has_image=False`. Classification: area-level (Phase Q HTML area axis) required-pattern validation at content-verifier time. **Not** Phase Z frame_id × sub_zone contract validation. ## A2 — Phase Q `REQUIRED_PATTERNS` shape (read-only reference) The Phase Q pattern-dict shape — **values are Phase Q-specific and excluded from reuse; only the shape is Phase Z design input.** | Axis | Phase Q shape | Where observed | |---|---|---| | Key axis | area name (string) | `src/content_verifier.py:382` keys: `body_bg` / `body_core` / `sidebar` / `footer` | | Value type | `list[str]` of substring patterns | `src/content_verifier.py:383-391` | | Alternation semantics | `"a\|b"` → OR (any alt passes) via `pattern.split("|")` | `src/content_verifier.py:410` | | Image-conditional branch | `has_image=True` ∧ `area_name=="body_core"` → implicit `"slide-img-"` requirement | `src/content_verifier.py:414-416` | | Font hierarchy tolerance | 1px (`fs > max_font + 1`); area-name → max-font fixed lookup | `src/content_verifier.py:433`, `:421-426` | | Pass/score rule | `passed = (missing == [])`; score = continuous degradation `1.0 - len(missing)/max(1, len(patterns))` | `src/content_verifier.py:438`, `:445` | | Empty-pattern handling | `max(1, len(patterns))` guards divide-by-zero; empty pattern list always passes | `src/content_verifier.py:445`, `:382-383` (`body_bg=[]`) | Shape-only carry-over candidates for Phase Z design (see A3 in u2): - `dict[key]→list[pattern]` indirection. - OR via in-string `|` separator (low-ceremony alternation). - Conditional implicit requirement injected by external context flag (here `has_image`; in Phase Z potentially `accepted_content_types` per sub_zone). - Continuous score degradation rather than binary pass/fail (downstream consumers can threshold). - Separate `errors` (block) vs `warnings` (advisory) lanes — font hierarchy lives in warnings, not errors. Values that **must not** carry into Phase Z: the literal strings `"key-msg"`, `"padding-left"`, `"text-indent"`, `"slide-img-"`, and the area names `body_bg` / `body_core` / `sidebar` / `footer` themselves — these are Phase Q area-HTML idioms, not Phase Z frame/slot idioms. ## A3 — Phase Z target pattern dict (design input, not yet active) The Phase Z-native target axis = **frame_id × sub_zone** pattern dict, aligned with `templates/phase_z2/catalog/frame_contracts.yaml`. References (do **not** modify): - `templates/phase_z2/catalog/frame_contracts.yaml:21` `three_parallel_requirements` (F13, 3 sub_zones), `:77` `process_product_two_way` (F29, 2 sub_zones × strict 3 cardinality), `:128` `bim_issues_quadrant_four` (F16, 4 sub_zones), `:189` `three_persona_benefits` (F14, 3 sub_zones), `:253` `construction_goals_three_circle_intersection` (F12, 3+1 sub_zones — `intersection` is `min:0,max:1`), `:323` `construction_bim_three_usage` (F11, 3 sub_zones), `:391` `bim_dx_comparison_table` (F18, 2 header + 1 `rows` with `min:1,max:12`), `:456` `dx_sw_necessity_three_perspectives` (F20, 3 sub_zones), `:520` `info_management_what_how_when` (F8, 3 sub_zones), `:580` `sw_reality_three_emphasis` (F28, 3 sub_zones), `:637` `bim_current_problems_paired` (F17, 8 sub_zones — row × side 2-axis). - All 11 contracts carry `accepted_content_types` + `sub_zones`; field `density_envelope` is absent across the catalog (verified `grep -c "density_envelope" templates/phase_z2/catalog/frame_contracts.yaml` = 0). - `src/phase_z2_mapper.py:49-57` `load_frame_contracts` / `get_contract` — direct dict lookup against the 11 entries above. - `src/phase_z2_pipeline.py:3776-3805` Step 10 emit — currently surfaces `frame_id` / `family` / `source_shape` / `cardinality` / `visual_hints` / `accepted_content_types` / `sub_zones` / `payload_builder` / `payload_builder_options` to `step10_frame_contract.json` with `step_status="partial"`. No pattern-dict assertion runs against this payload yet. Abstraction-mismatch table (Phase Q area-level vs Phase Z frame/slot-level): | Axis | Phase Q (A1+A2) | Phase Z target (A3) | |---|---|---| | Key | area name (`body_bg`/`body_core`/`sidebar`/`footer`) | `(frame_id, sub_zone_id)` tuple — e.g. `(1171281190, "pillar_1")` | | Cardinality of keys | 4 fixed area names | open over 11 contracts × N sub_zones (3+2+4+3+4+3+3+3+3+3+8 = 39 sub_zones in current catalog) | | Value semantics | substring presence (HTML-string match) | candidates: substring presence and/or contract-field assertion (`cardinality.strict` / `accepts` membership / `partial_target_path` resolution) | | Conditional branch input | `has_image` external flag | `accepted_content_types` per sub_zone (catalog-driven, not external flag) | | Tolerance | 1px on font-size (single axis) | candidates: font-size 1px tolerance carried over **or** replaced by `visual_hints.min_height_px` envelope check | | Validation timing | post-render HTML (`generated_html` string) | post Step 18 final.html (mirrors Phase Q timing) — Step 12 light_edit/restructure proposal is excluded (proposal is upstream of render) | | Result lanes | `errors` (block) + `warnings` (advisory) | preserved as-is from Phase Q shape (continuous score; separate font-hierarchy warnings) | Classification: Phase Q area axis ⇄ Phase Z frame/slot axis are **not** drop-in compatible. The shape (dict indirection + OR alternation + tolerance + conditional implicit-requirement + continuous score) is the only portable element; every value (key strings, area names, literal patterns) is Phase Q-local. ## A4 — IMP-04 soft-link boundary (catalog vs validation ownership) IMP-20 is `soft link: IMP-04` per the backlog (`docs/architecture/PHASE-Z-IMPLEMENTATION-ISSUE-BACKLOG.md:71`). Ownership separation: - **IMP-04 owns**: every `frame_contracts.yaml` entry — addition / removal / `accepted_content_types` change / `sub_zones` schema change / `cardinality` change / `visual_hints` change. `templates/phase_z2/catalog/frame_contracts.yaml` is the IMP-04 source of truth. - **IMP-20 owns**: reference-only documentation of the Phase Q pattern-dict shape (A1 + A2) and the Phase Z target axis design narrative (A3). No catalog edits, no Step 10 promotion. - **Coupling direction**: **one-way** read. A Phase Z pattern dict (if/when activated through the A5 gate) consumes `frame_contracts.yaml` as input. It does **not** publish back into the catalog. IMP-04 is unaware of IMP-20. - **No bidirectional code flow**: IMP-20 does not move Phase Q `content_verifier.py` code into Phase Z, and IMP-04 does not consume `REQUIRED_PATTERNS`. The two surfaces remain isolated. - **Reference direction is one-way**: this document points read-only at `src/content_verifier.py`, `src/phase_z2_mapper.py`, `src/phase_z2_pipeline.py`, and `templates/phase_z2/catalog/frame_contracts.yaml`. No reverse pointer is required in those source files. If IMP-04 alters the catalog schema (e.g. adds `density_envelope` or renames `sub_zones`), A3 must be re-verified (key axis and conditional-branch row in particular). The boundary statement itself does not change. ## A5 — Re-activation gate + guardrails IMP-20 is `documented` (dormant). Re-activation requires **all** of the following gate conditions (3-cond AND): 1. **Trigger**: Phase Z Step 10 produces a verifiable case where the partial frame-contract emit alone is insufficient — i.e., a final.html regression that a frame_id × sub_zone pattern dict would have caught (missing slot marker, contract field violation, font-hierarchy breach against a sub_zone-resolved max). The trigger must be a regression that maps cleanly to the frame/slot axis, **not** to a higher layer (composition planning, content adapter, render-time CSS). 2. **Evidence requirement**: failing-case MDX + `step10_frame_contract.json` trace + final.html excerpt with the slot path that should have asserted, attached to a new issue or this issue's reopened state. 3. **IMP-04 sign-off**: the IMP-04 owner confirms the failing case is **not** addressable inside the catalog (e.g. tightening `cardinality` or `accepted_content_types` does not resolve it) — only then is a Phase Z-native pattern dict justified. Design questions resolved in this document (revisit if the gate fires): - **Q1 — Key granularity**: `(frame_id, sub_zone_id)`. Frame-only granularity is insufficient because contracts with `sub_zones` of differing `accepts` (e.g. F29 `process_column` accepts `[text_block, transform_table]` vs `product_column` accepts `[text_block]`) require slot-level differentiation. - **Q2 — Value type**: hybrid — substring patterns (Phase Q parity) **plus** contract-field assertions (`cardinality.strict` / `accepts` membership / `partial_target_path` resolved in DOM) **plus** numeric tolerance (carried from font-hierarchy 1px). Three lanes preserved separately so each can fail/pass independently. - **Q3 — Validation timing**: post Step 18 final.html **only**. Step 12 light_edit/restructure proposal is upstream of render and exposes no HTML for substring assertion; running the dict there would either fire false negatives (no DOM yet) or duplicate Step 18 work. - **Q4 — Font-hierarchy carry-over**: replaced — Phase Q's `role_font_map` fixed dict (area → max-font) is Phase Q-local. The Phase Z equivalent reads from `frame_contracts.yaml` `visual_hints` (`min_height_px` already present; a future `max_font_px` field would live in `visual_hints` and is IMP-04-owned). 1px tolerance shape is portable; the lookup source is replaced. Guardrails (preserved from Stage 1 + Stage 2): - **GR1 — Shape-only reference**: no Phase Q `REQUIRED_PATTERNS` value (`"key-msg"`, `"padding-left"`, `"text-indent"`, `"slide-img-"`) or area name (`body_bg`/`body_core`/`sidebar`/`footer`) may appear in any Phase Z pattern dict activation. - **GR2 — Phase Q no-regression**: `src/content_verifier.py:382-392` `REQUIRED_PATTERNS` is no-touch. The Phase T `L379-381` comment (overflow:hidden removed) remains the no-regression boundary; any Phase Z dict design must not re-introduce removed patterns into Phase Q's surface. - **GR3 — Phase Z dict is Phase Z-owned**: no `import` of `content_verifier.REQUIRED_PATTERNS` from Phase Z code. The two pattern dicts coexist without symbol sharing. - **GR4 — IMP-04 soft-link one-way**: per § A4. Activating IMP-20 must not block on or modify IMP-04; the catalog is read-only input. - **PZ-1 — AI isolation contract**: pattern dict is code/spec, not AI-generated content. No Kei rewrite, no LLM proposal of pattern values (`feedback_ai_isolation_contract`). - **RULE 13 — Anchor sync**: any future activation must update backlog (`PHASE-Z-IMPLEMENTATION-ISSUE-BACKLOG.md`), status board (`PHASE-Z-PIPELINE-STATUS-BOARD.md`), and INSIGHT-MAP (`PHASE-Q-INSIGHT-TO-22STEP-MAP.md`) in the same commit. If IMP-04 alters the catalog schema or `src/content_verifier.py` is rewritten upstream, A1–A3 must be re-verified (file:line refs); the A5 gate itself does not change.