Files
C.E.L_Slide_test2/docs/architecture/IMP-20-FRAME-CONTRACT-VALIDATION-REFERENCE.md

110 lines
14 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# IMP-20 — Phase Q `content_verifier` Frame Contract Validation Pattern Reference
**Status**: documented (reference-only, dormant)
**Scope**: doc-only. No runtime surface modified.
**Related issue**: https://gitea.hmac.kr/Kyeongmin/C.E.L_Slide_test2/issues/20
**Soft dependency**: IMP-04 (extended catalog application) — IMP-20 stays dormant; activates only via the A5 gate.
**Source axis**: INSIGHT-MAP §3 / §2.7 H2 — `content_verifier.verify_structure` pattern reference.
---
## A1 — Phase Q consumer pattern (read-only reference)
Phase Q implements area-level required-pattern validation at the content-verifier layer. References (do **not** modify):
- `src/content_verifier.py:382-392``REQUIRED_PATTERNS: dict[str, list[str]]` — top-level pattern dictionary keyed by area name (`body_bg`, `body_core`, `sidebar`, `footer`). Values verified: `body_bg=[]`, `body_core=["key-msg"]`, `sidebar=["padding-left", "text-indent"]`, `footer=[]`. Phase T (`L379-381` comment) removed the `overflow:hidden` requirement to reconcile with the Phase T prompt's "overflow:hidden 금지" directive — that no-regression boundary is preserved.
- `src/content_verifier.py:395-448``verify_structure(generated_html, area_name, has_image=False, font_hierarchy=None) → VerificationResult` — the substring-check + OR + tolerance core logic.
- `:405-412` — substring presence loop. Each pattern string is split on `|` (`pattern.split("|")` at L410) and treated as an OR alternation: any alternative present passes the pattern. Missing alternatives are appended to a `missing` list.
- `:414-416``has_image` branch. When `has_image=True` and `area_name == "body_core"`, an additional implicit requirement is enforced: `"slide-img-"` must appear in `generated_html`. Missing image marker is reported as `"slide-img-* (이미지 태그)"` in `missing`.
- `:418-436``font_hierarchy` branch. When supplied, area-name → max-font lookup uses a fixed `role_font_map = {"body_bg":bg/11, "body_core":core/12, "sidebar":sidebar/10, "footer":core/12}`. HTML `font-size:\s*(\d+(?:\.\d+)?)\s*px` matches are extracted via regex (L430); each measured size > `max_font + 1` (1px tolerance at L433) emits a `font_warnings` entry. Warnings do **not** flip `passed`.
- `:438-447` — result construction. `passed = (len(missing) == 0)`. `score = 1.0` on pass else `1.0 - len(missing) / max(1, len(patterns))` (continuous degradation; `max(1, …)` guards empty-pattern division by zero). Errors prefixed `"필수 패턴 누락: "`. Warnings carry font hierarchy violations only.
- `src/content_verifier.py:455-487``verify_area(original_text, generated_html, area_name, has_image=False) → VerificationResult` — composes L1 (`verify_text_preservation`) + L2 (`verify_no_forbidden_content`) + L3 (`verify_structure`) at L462-466. `verify_structure` call at L465 passes `has_image` but **not** `font_hierarchy` (font_hierarchy is unused inside `verify_area`).
- `src/content_verifier.py:490-529``verify_all_areas(generated, area_texts, has_image_areas=None)` — area dispatch fan-out. `body_html` is split into `body_bg` + `body_core` (L510-519); `body_core` is the **only** branch that propagates `has_image=("body_core" in has_image_areas)` to `verify_area` (L518). `sidebar_html` (L521-525) and `footer_html` (L527-531) call `verify_area` with default `has_image=False`.
Classification: area-level (Phase Q HTML area axis) required-pattern validation at content-verifier time. **Not** Phase Z frame_id × sub_zone contract validation.
## A2 — Phase Q `REQUIRED_PATTERNS` shape (read-only reference)
The Phase Q pattern-dict shape — **values are Phase Q-specific and excluded from reuse; only the shape is Phase Z design input.**
| Axis | Phase Q shape | Where observed |
|---|---|---|
| Key axis | area name (string) | `src/content_verifier.py:382` keys: `body_bg` / `body_core` / `sidebar` / `footer` |
| Value type | `list[str]` of substring patterns | `src/content_verifier.py:383-391` |
| Alternation semantics | `"a\|b"` → OR (any alt passes) via `pattern.split("|")` | `src/content_verifier.py:410` |
| Image-conditional branch | `has_image=True``area_name=="body_core"` → implicit `"slide-img-"` requirement | `src/content_verifier.py:414-416` |
| Font hierarchy tolerance | 1px (`fs > max_font + 1`); area-name → max-font fixed lookup | `src/content_verifier.py:433`, `:421-426` |
| Pass/score rule | `passed = (missing == [])`; score = continuous degradation `1.0 - len(missing)/max(1, len(patterns))` | `src/content_verifier.py:438`, `:445` |
| Empty-pattern handling | `max(1, len(patterns))` guards divide-by-zero; empty pattern list always passes | `src/content_verifier.py:445`, `:382-383` (`body_bg=[]`) |
Shape-only carry-over candidates for Phase Z design (see A3 in u2):
- `dict[key]→list[pattern]` indirection.
- OR via in-string `|` separator (low-ceremony alternation).
- Conditional implicit requirement injected by external context flag (here `has_image`; in Phase Z potentially `accepted_content_types` per sub_zone).
- Continuous score degradation rather than binary pass/fail (downstream consumers can threshold).
- Separate `errors` (block) vs `warnings` (advisory) lanes — font hierarchy lives in warnings, not errors.
Values that **must not** carry into Phase Z: the literal strings `"key-msg"`, `"padding-left"`, `"text-indent"`, `"slide-img-"`, and the area names `body_bg` / `body_core` / `sidebar` / `footer` themselves — these are Phase Q area-HTML idioms, not Phase Z frame/slot idioms.
## A3 — Phase Z target pattern dict (design input, not yet active)
The Phase Z-native target axis = **frame_id × sub_zone** pattern dict, aligned with `templates/phase_z2/catalog/frame_contracts.yaml`. References (do **not** modify):
- `templates/phase_z2/catalog/frame_contracts.yaml:21` `three_parallel_requirements` (F13, 3 sub_zones), `:77` `process_product_two_way` (F29, 2 sub_zones × strict 3 cardinality), `:128` `bim_issues_quadrant_four` (F16, 4 sub_zones), `:189` `three_persona_benefits` (F14, 3 sub_zones), `:253` `construction_goals_three_circle_intersection` (F12, 3+1 sub_zones — `intersection` is `min:0,max:1`), `:323` `construction_bim_three_usage` (F11, 3 sub_zones), `:391` `bim_dx_comparison_table` (F18, 2 header + 1 `rows` with `min:1,max:12`), `:456` `dx_sw_necessity_three_perspectives` (F20, 3 sub_zones), `:520` `info_management_what_how_when` (F8, 3 sub_zones), `:580` `sw_reality_three_emphasis` (F28, 3 sub_zones), `:637` `bim_current_problems_paired` (F17, 8 sub_zones — row × side 2-axis).
- All 11 contracts carry `accepted_content_types` + `sub_zones`; field `density_envelope` is absent across the catalog (verified `grep -c "density_envelope" templates/phase_z2/catalog/frame_contracts.yaml` = 0).
- `src/phase_z2_mapper.py:49-57` `load_frame_contracts` / `get_contract` — direct dict lookup against the 11 entries above.
- `src/phase_z2_pipeline.py:3776-3805` Step 10 emit — currently surfaces `frame_id` / `family` / `source_shape` / `cardinality` / `visual_hints` / `accepted_content_types` / `sub_zones` / `payload_builder` / `payload_builder_options` to `step10_frame_contract.json` with `step_status="partial"`. No pattern-dict assertion runs against this payload yet.
Abstraction-mismatch table (Phase Q area-level vs Phase Z frame/slot-level):
| Axis | Phase Q (A1+A2) | Phase Z target (A3) |
|---|---|---|
| Key | area name (`body_bg`/`body_core`/`sidebar`/`footer`) | `(frame_id, sub_zone_id)` tuple — e.g. `(1171281190, "pillar_1")` |
| Cardinality of keys | 4 fixed area names | open over 11 contracts × N sub_zones (3+2+4+3+4+3+3+3+3+3+8 = 39 sub_zones in current catalog) |
| Value semantics | substring presence (HTML-string match) | candidates: substring presence and/or contract-field assertion (`cardinality.strict` / `accepts` membership / `partial_target_path` resolution) |
| Conditional branch input | `has_image` external flag | `accepted_content_types` per sub_zone (catalog-driven, not external flag) |
| Tolerance | 1px on font-size (single axis) | candidates: font-size 1px tolerance carried over **or** replaced by `visual_hints.min_height_px` envelope check |
| Validation timing | post-render HTML (`generated_html` string) | post Step 18 final.html (mirrors Phase Q timing) — Step 12 light_edit/restructure proposal is excluded (proposal is upstream of render) |
| Result lanes | `errors` (block) + `warnings` (advisory) | preserved as-is from Phase Q shape (continuous score; separate font-hierarchy warnings) |
Classification: Phase Q area axis ⇄ Phase Z frame/slot axis are **not** drop-in compatible. The shape (dict indirection + OR alternation + tolerance + conditional implicit-requirement + continuous score) is the only portable element; every value (key strings, area names, literal patterns) is Phase Q-local.
## A4 — IMP-04 soft-link boundary (catalog vs validation ownership)
IMP-20 is `soft link: IMP-04` per the backlog (`docs/architecture/PHASE-Z-IMPLEMENTATION-ISSUE-BACKLOG.md:71`). Ownership separation:
- **IMP-04 owns**: every `frame_contracts.yaml` entry — addition / removal / `accepted_content_types` change / `sub_zones` schema change / `cardinality` change / `visual_hints` change. `templates/phase_z2/catalog/frame_contracts.yaml` is the IMP-04 source of truth.
- **IMP-20 owns**: reference-only documentation of the Phase Q pattern-dict shape (A1 + A2) and the Phase Z target axis design narrative (A3). No catalog edits, no Step 10 promotion.
- **Coupling direction**: **one-way** read. A Phase Z pattern dict (if/when activated through the A5 gate) consumes `frame_contracts.yaml` as input. It does **not** publish back into the catalog. IMP-04 is unaware of IMP-20.
- **No bidirectional code flow**: IMP-20 does not move Phase Q `content_verifier.py` code into Phase Z, and IMP-04 does not consume `REQUIRED_PATTERNS`. The two surfaces remain isolated.
- **Reference direction is one-way**: this document points read-only at `src/content_verifier.py`, `src/phase_z2_mapper.py`, `src/phase_z2_pipeline.py`, and `templates/phase_z2/catalog/frame_contracts.yaml`. No reverse pointer is required in those source files.
If IMP-04 alters the catalog schema (e.g. adds `density_envelope` or renames `sub_zones`), A3 must be re-verified (key axis and conditional-branch row in particular). The boundary statement itself does not change.
## A5 — Re-activation gate + guardrails
IMP-20 is `documented` (dormant). Re-activation requires **all** of the following gate conditions (3-cond AND):
1. **Trigger**: Phase Z Step 10 produces a verifiable case where the partial frame-contract emit alone is insufficient — i.e., a final.html regression that a frame_id × sub_zone pattern dict would have caught (missing slot marker, contract field violation, font-hierarchy breach against a sub_zone-resolved max). The trigger must be a regression that maps cleanly to the frame/slot axis, **not** to a higher layer (composition planning, content adapter, render-time CSS).
2. **Evidence requirement**: failing-case MDX + `step10_frame_contract.json` trace + final.html excerpt with the slot path that should have asserted, attached to a new issue or this issue's reopened state.
3. **IMP-04 sign-off**: the IMP-04 owner confirms the failing case is **not** addressable inside the catalog (e.g. tightening `cardinality` or `accepted_content_types` does not resolve it) — only then is a Phase Z-native pattern dict justified.
Design questions resolved in this document (revisit if the gate fires):
- **Q1 — Key granularity**: `(frame_id, sub_zone_id)`. Frame-only granularity is insufficient because contracts with `sub_zones` of differing `accepts` (e.g. F29 `process_column` accepts `[text_block, transform_table]` vs `product_column` accepts `[text_block]`) require slot-level differentiation.
- **Q2 — Value type**: hybrid — substring patterns (Phase Q parity) **plus** contract-field assertions (`cardinality.strict` / `accepts` membership / `partial_target_path` resolved in DOM) **plus** numeric tolerance (carried from font-hierarchy 1px). Three lanes preserved separately so each can fail/pass independently.
- **Q3 — Validation timing**: post Step 18 final.html **only**. Step 12 light_edit/restructure proposal is upstream of render and exposes no HTML for substring assertion; running the dict there would either fire false negatives (no DOM yet) or duplicate Step 18 work.
- **Q4 — Font-hierarchy carry-over**: replaced — Phase Q's `role_font_map` fixed dict (area → max-font) is Phase Q-local. The Phase Z equivalent reads from `frame_contracts.yaml` `visual_hints` (`min_height_px` already present; a future `max_font_px` field would live in `visual_hints` and is IMP-04-owned). 1px tolerance shape is portable; the lookup source is replaced.
Guardrails (preserved from Stage 1 + Stage 2):
- **GR1 — Shape-only reference**: no Phase Q `REQUIRED_PATTERNS` value (`"key-msg"`, `"padding-left"`, `"text-indent"`, `"slide-img-"`) or area name (`body_bg`/`body_core`/`sidebar`/`footer`) may appear in any Phase Z pattern dict activation.
- **GR2 — Phase Q no-regression**: `src/content_verifier.py:382-392` `REQUIRED_PATTERNS` is no-touch. The Phase T `L379-381` comment (overflow:hidden removed) remains the no-regression boundary; any Phase Z dict design must not re-introduce removed patterns into Phase Q's surface.
- **GR3 — Phase Z dict is Phase Z-owned**: no `import` of `content_verifier.REQUIRED_PATTERNS` from Phase Z code. The two pattern dicts coexist without symbol sharing.
- **GR4 — IMP-04 soft-link one-way**: per § A4. Activating IMP-20 must not block on or modify IMP-04; the catalog is read-only input.
- **PZ-1 — AI isolation contract**: pattern dict is code/spec, not AI-generated content. No Kei rewrite, no LLM proposal of pattern values (`feedback_ai_isolation_contract`).
- **RULE 13 — Anchor sync**: any future activation must update backlog (`PHASE-Z-IMPLEMENTATION-ISSUE-BACKLOG.md`), status board (`PHASE-Z-PIPELINE-STATUS-BOARD.md`), and INSIGHT-MAP (`PHASE-Q-INSIGHT-TO-22STEP-MAP.md`) in the same commit.
If IMP-04 alters the catalog schema or `src/content_verifier.py` is rewritten upstream, A1A3 must be re-verified (file:line refs); the A5 gate itself does not change.