From 8c60f7cc85b9f2722b97f49f3d0e0a9a35cc1b89 Mon Sep 17 00:00:00 2001 From: kyeongmin Date: Wed, 20 May 2026 00:02:18 +0900 Subject: [PATCH] docs(IMP-20): frame contract validation reference + cross-link -- documented-axis close --- ...-20-FRAME-CONTRACT-VALIDATION-REFERENCE.md | 109 ++++++++++++++++++ .../PHASE-Q-INSIGHT-TO-22STEP-MAP.md | 2 +- .../PHASE-Z-IMPLEMENTATION-ISSUE-BACKLOG.md | 2 +- .../PHASE-Z-PIPELINE-STATUS-BOARD.md | 2 +- 4 files changed, 112 insertions(+), 3 deletions(-) create mode 100644 docs/architecture/IMP-20-FRAME-CONTRACT-VALIDATION-REFERENCE.md diff --git a/docs/architecture/IMP-20-FRAME-CONTRACT-VALIDATION-REFERENCE.md b/docs/architecture/IMP-20-FRAME-CONTRACT-VALIDATION-REFERENCE.md new file mode 100644 index 0000000..9ef7cb4 --- /dev/null +++ b/docs/architecture/IMP-20-FRAME-CONTRACT-VALIDATION-REFERENCE.md @@ -0,0 +1,109 @@ +# IMP-20 — Phase Q `content_verifier` Frame Contract Validation Pattern Reference + +**Status**: documented (reference-only, dormant) +**Scope**: doc-only. No runtime surface modified. +**Related issue**: https://gitea.hmac.kr/Kyeongmin/C.E.L_Slide_test2/issues/20 +**Soft dependency**: IMP-04 (extended catalog application) — IMP-20 stays dormant; activates only via the A5 gate. +**Source axis**: INSIGHT-MAP §3 / §2.7 H2 — `content_verifier.verify_structure` pattern reference. + +--- + +## A1 — Phase Q consumer pattern (read-only reference) + +Phase Q implements area-level required-pattern validation at the content-verifier layer. References (do **not** modify): + +- `src/content_verifier.py:382-392` — `REQUIRED_PATTERNS: dict[str, list[str]]` — top-level pattern dictionary keyed by area name (`body_bg`, `body_core`, `sidebar`, `footer`). Values verified: `body_bg=[]`, `body_core=["key-msg"]`, `sidebar=["padding-left", "text-indent"]`, `footer=[]`. Phase T (`L379-381` comment) removed the `overflow:hidden` requirement to reconcile with the Phase T prompt's "overflow:hidden 금지" directive — that no-regression boundary is preserved. +- `src/content_verifier.py:395-448` — `verify_structure(generated_html, area_name, has_image=False, font_hierarchy=None) → VerificationResult` — the substring-check + OR + tolerance core logic. + - `:405-412` — substring presence loop. Each pattern string is split on `|` (`pattern.split("|")` at L410) and treated as an OR alternation: any alternative present passes the pattern. Missing alternatives are appended to a `missing` list. + - `:414-416` — `has_image` branch. When `has_image=True` and `area_name == "body_core"`, an additional implicit requirement is enforced: `"slide-img-"` must appear in `generated_html`. Missing image marker is reported as `"slide-img-* (이미지 태그)"` in `missing`. + - `:418-436` — `font_hierarchy` branch. When supplied, area-name → max-font lookup uses a fixed `role_font_map = {"body_bg":bg/11, "body_core":core/12, "sidebar":sidebar/10, "footer":core/12}`. HTML `font-size:\s*(\d+(?:\.\d+)?)\s*px` matches are extracted via regex (L430); each measured size > `max_font + 1` (1px tolerance at L433) emits a `font_warnings` entry. Warnings do **not** flip `passed`. + - `:438-447` — result construction. `passed = (len(missing) == 0)`. `score = 1.0` on pass else `1.0 - len(missing) / max(1, len(patterns))` (continuous degradation; `max(1, …)` guards empty-pattern division by zero). Errors prefixed `"필수 패턴 누락: "`. Warnings carry font hierarchy violations only. +- `src/content_verifier.py:455-487` — `verify_area(original_text, generated_html, area_name, has_image=False) → VerificationResult` — composes L1 (`verify_text_preservation`) + L2 (`verify_no_forbidden_content`) + L3 (`verify_structure`) at L462-466. `verify_structure` call at L465 passes `has_image` but **not** `font_hierarchy` (font_hierarchy is unused inside `verify_area`). +- `src/content_verifier.py:490-529` — `verify_all_areas(generated, area_texts, has_image_areas=None)` — area dispatch fan-out. `body_html` is split into `body_bg` + `body_core` (L510-519); `body_core` is the **only** branch that propagates `has_image=("body_core" in has_image_areas)` to `verify_area` (L518). `sidebar_html` (L521-525) and `footer_html` (L527-531) call `verify_area` with default `has_image=False`. + +Classification: area-level (Phase Q HTML area axis) required-pattern validation at content-verifier time. **Not** Phase Z frame_id × sub_zone contract validation. + +## A2 — Phase Q `REQUIRED_PATTERNS` shape (read-only reference) + +The Phase Q pattern-dict shape — **values are Phase Q-specific and excluded from reuse; only the shape is Phase Z design input.** + +| Axis | Phase Q shape | Where observed | +|---|---|---| +| Key axis | area name (string) | `src/content_verifier.py:382` keys: `body_bg` / `body_core` / `sidebar` / `footer` | +| Value type | `list[str]` of substring patterns | `src/content_verifier.py:383-391` | +| Alternation semantics | `"a\|b"` → OR (any alt passes) via `pattern.split("|")` | `src/content_verifier.py:410` | +| Image-conditional branch | `has_image=True` ∧ `area_name=="body_core"` → implicit `"slide-img-"` requirement | `src/content_verifier.py:414-416` | +| Font hierarchy tolerance | 1px (`fs > max_font + 1`); area-name → max-font fixed lookup | `src/content_verifier.py:433`, `:421-426` | +| Pass/score rule | `passed = (missing == [])`; score = continuous degradation `1.0 - len(missing)/max(1, len(patterns))` | `src/content_verifier.py:438`, `:445` | +| Empty-pattern handling | `max(1, len(patterns))` guards divide-by-zero; empty pattern list always passes | `src/content_verifier.py:445`, `:382-383` (`body_bg=[]`) | + +Shape-only carry-over candidates for Phase Z design (see A3 in u2): + +- `dict[key]→list[pattern]` indirection. +- OR via in-string `|` separator (low-ceremony alternation). +- Conditional implicit requirement injected by external context flag (here `has_image`; in Phase Z potentially `accepted_content_types` per sub_zone). +- Continuous score degradation rather than binary pass/fail (downstream consumers can threshold). +- Separate `errors` (block) vs `warnings` (advisory) lanes — font hierarchy lives in warnings, not errors. + +Values that **must not** carry into Phase Z: the literal strings `"key-msg"`, `"padding-left"`, `"text-indent"`, `"slide-img-"`, and the area names `body_bg` / `body_core` / `sidebar` / `footer` themselves — these are Phase Q area-HTML idioms, not Phase Z frame/slot idioms. + +## A3 — Phase Z target pattern dict (design input, not yet active) + +The Phase Z-native target axis = **frame_id × sub_zone** pattern dict, aligned with `templates/phase_z2/catalog/frame_contracts.yaml`. References (do **not** modify): + +- `templates/phase_z2/catalog/frame_contracts.yaml:21` `three_parallel_requirements` (F13, 3 sub_zones), `:77` `process_product_two_way` (F29, 2 sub_zones × strict 3 cardinality), `:128` `bim_issues_quadrant_four` (F16, 4 sub_zones), `:189` `three_persona_benefits` (F14, 3 sub_zones), `:253` `construction_goals_three_circle_intersection` (F12, 3+1 sub_zones — `intersection` is `min:0,max:1`), `:323` `construction_bim_three_usage` (F11, 3 sub_zones), `:391` `bim_dx_comparison_table` (F18, 2 header + 1 `rows` with `min:1,max:12`), `:456` `dx_sw_necessity_three_perspectives` (F20, 3 sub_zones), `:520` `info_management_what_how_when` (F8, 3 sub_zones), `:580` `sw_reality_three_emphasis` (F28, 3 sub_zones), `:637` `bim_current_problems_paired` (F17, 8 sub_zones — row × side 2-axis). +- All 11 contracts carry `accepted_content_types` + `sub_zones`; field `density_envelope` is absent across the catalog (verified `grep -c "density_envelope" templates/phase_z2/catalog/frame_contracts.yaml` = 0). +- `src/phase_z2_mapper.py:49-57` `load_frame_contracts` / `get_contract` — direct dict lookup against the 11 entries above. +- `src/phase_z2_pipeline.py:3776-3805` Step 10 emit — currently surfaces `frame_id` / `family` / `source_shape` / `cardinality` / `visual_hints` / `accepted_content_types` / `sub_zones` / `payload_builder` / `payload_builder_options` to `step10_frame_contract.json` with `step_status="partial"`. No pattern-dict assertion runs against this payload yet. + +Abstraction-mismatch table (Phase Q area-level vs Phase Z frame/slot-level): + +| Axis | Phase Q (A1+A2) | Phase Z target (A3) | +|---|---|---| +| Key | area name (`body_bg`/`body_core`/`sidebar`/`footer`) | `(frame_id, sub_zone_id)` tuple — e.g. `(1171281190, "pillar_1")` | +| Cardinality of keys | 4 fixed area names | open over 11 contracts × N sub_zones (3+2+4+3+4+3+3+3+3+3+8 = 39 sub_zones in current catalog) | +| Value semantics | substring presence (HTML-string match) | candidates: substring presence and/or contract-field assertion (`cardinality.strict` / `accepts` membership / `partial_target_path` resolution) | +| Conditional branch input | `has_image` external flag | `accepted_content_types` per sub_zone (catalog-driven, not external flag) | +| Tolerance | 1px on font-size (single axis) | candidates: font-size 1px tolerance carried over **or** replaced by `visual_hints.min_height_px` envelope check | +| Validation timing | post-render HTML (`generated_html` string) | post Step 18 final.html (mirrors Phase Q timing) — Step 12 light_edit/restructure proposal is excluded (proposal is upstream of render) | +| Result lanes | `errors` (block) + `warnings` (advisory) | preserved as-is from Phase Q shape (continuous score; separate font-hierarchy warnings) | + +Classification: Phase Q area axis ⇄ Phase Z frame/slot axis are **not** drop-in compatible. The shape (dict indirection + OR alternation + tolerance + conditional implicit-requirement + continuous score) is the only portable element; every value (key strings, area names, literal patterns) is Phase Q-local. + +## A4 — IMP-04 soft-link boundary (catalog vs validation ownership) + +IMP-20 is `soft link: IMP-04` per the backlog (`docs/architecture/PHASE-Z-IMPLEMENTATION-ISSUE-BACKLOG.md:71`). Ownership separation: + +- **IMP-04 owns**: every `frame_contracts.yaml` entry — addition / removal / `accepted_content_types` change / `sub_zones` schema change / `cardinality` change / `visual_hints` change. `templates/phase_z2/catalog/frame_contracts.yaml` is the IMP-04 source of truth. +- **IMP-20 owns**: reference-only documentation of the Phase Q pattern-dict shape (A1 + A2) and the Phase Z target axis design narrative (A3). No catalog edits, no Step 10 promotion. +- **Coupling direction**: **one-way** read. A Phase Z pattern dict (if/when activated through the A5 gate) consumes `frame_contracts.yaml` as input. It does **not** publish back into the catalog. IMP-04 is unaware of IMP-20. +- **No bidirectional code flow**: IMP-20 does not move Phase Q `content_verifier.py` code into Phase Z, and IMP-04 does not consume `REQUIRED_PATTERNS`. The two surfaces remain isolated. +- **Reference direction is one-way**: this document points read-only at `src/content_verifier.py`, `src/phase_z2_mapper.py`, `src/phase_z2_pipeline.py`, and `templates/phase_z2/catalog/frame_contracts.yaml`. No reverse pointer is required in those source files. + +If IMP-04 alters the catalog schema (e.g. adds `density_envelope` or renames `sub_zones`), A3 must be re-verified (key axis and conditional-branch row in particular). The boundary statement itself does not change. + +## A5 — Re-activation gate + guardrails + +IMP-20 is `documented` (dormant). Re-activation requires **all** of the following gate conditions (3-cond AND): + +1. **Trigger**: Phase Z Step 10 produces a verifiable case where the partial frame-contract emit alone is insufficient — i.e., a final.html regression that a frame_id × sub_zone pattern dict would have caught (missing slot marker, contract field violation, font-hierarchy breach against a sub_zone-resolved max). The trigger must be a regression that maps cleanly to the frame/slot axis, **not** to a higher layer (composition planning, content adapter, render-time CSS). +2. **Evidence requirement**: failing-case MDX + `step10_frame_contract.json` trace + final.html excerpt with the slot path that should have asserted, attached to a new issue or this issue's reopened state. +3. **IMP-04 sign-off**: the IMP-04 owner confirms the failing case is **not** addressable inside the catalog (e.g. tightening `cardinality` or `accepted_content_types` does not resolve it) — only then is a Phase Z-native pattern dict justified. + +Design questions resolved in this document (revisit if the gate fires): + +- **Q1 — Key granularity**: `(frame_id, sub_zone_id)`. Frame-only granularity is insufficient because contracts with `sub_zones` of differing `accepts` (e.g. F29 `process_column` accepts `[text_block, transform_table]` vs `product_column` accepts `[text_block]`) require slot-level differentiation. +- **Q2 — Value type**: hybrid — substring patterns (Phase Q parity) **plus** contract-field assertions (`cardinality.strict` / `accepts` membership / `partial_target_path` resolved in DOM) **plus** numeric tolerance (carried from font-hierarchy 1px). Three lanes preserved separately so each can fail/pass independently. +- **Q3 — Validation timing**: post Step 18 final.html **only**. Step 12 light_edit/restructure proposal is upstream of render and exposes no HTML for substring assertion; running the dict there would either fire false negatives (no DOM yet) or duplicate Step 18 work. +- **Q4 — Font-hierarchy carry-over**: replaced — Phase Q's `role_font_map` fixed dict (area → max-font) is Phase Q-local. The Phase Z equivalent reads from `frame_contracts.yaml` `visual_hints` (`min_height_px` already present; a future `max_font_px` field would live in `visual_hints` and is IMP-04-owned). 1px tolerance shape is portable; the lookup source is replaced. + +Guardrails (preserved from Stage 1 + Stage 2): + +- **GR1 — Shape-only reference**: no Phase Q `REQUIRED_PATTERNS` value (`"key-msg"`, `"padding-left"`, `"text-indent"`, `"slide-img-"`) or area name (`body_bg`/`body_core`/`sidebar`/`footer`) may appear in any Phase Z pattern dict activation. +- **GR2 — Phase Q no-regression**: `src/content_verifier.py:382-392` `REQUIRED_PATTERNS` is no-touch. The Phase T `L379-381` comment (overflow:hidden removed) remains the no-regression boundary; any Phase Z dict design must not re-introduce removed patterns into Phase Q's surface. +- **GR3 — Phase Z dict is Phase Z-owned**: no `import` of `content_verifier.REQUIRED_PATTERNS` from Phase Z code. The two pattern dicts coexist without symbol sharing. +- **GR4 — IMP-04 soft-link one-way**: per § A4. Activating IMP-20 must not block on or modify IMP-04; the catalog is read-only input. +- **PZ-1 — AI isolation contract**: pattern dict is code/spec, not AI-generated content. No Kei rewrite, no LLM proposal of pattern values (`feedback_ai_isolation_contract`). +- **RULE 13 — Anchor sync**: any future activation must update backlog (`PHASE-Z-IMPLEMENTATION-ISSUE-BACKLOG.md`), status board (`PHASE-Z-PIPELINE-STATUS-BOARD.md`), and INSIGHT-MAP (`PHASE-Q-INSIGHT-TO-22STEP-MAP.md`) in the same commit. + +If IMP-04 alters the catalog schema or `src/content_verifier.py` is rewritten upstream, A1–A3 must be re-verified (file:line refs); the A5 gate itself does not change. diff --git a/docs/architecture/PHASE-Q-INSIGHT-TO-22STEP-MAP.md b/docs/architecture/PHASE-Q-INSIGHT-TO-22STEP-MAP.md index 293dd1a..14ac90f 100644 --- a/docs/architecture/PHASE-Q-INSIGHT-TO-22STEP-MAP.md +++ b/docs/architecture/PHASE-Q-INSIGHT-TO-22STEP-MAP.md @@ -123,7 +123,7 @@ | IMP-17 AI repair fallback infra (carve-out — see [`IMP-17-CARVE-OUT.md`](IMP-17-CARVE-OUT.md)) | Step 12, 16, 17 | §2.6 G3 (`httpx` + SSE streaming + retry + JSON parse pattern) | pending | no (AI fallback only) | | I3 SVG 좌표 보강 | Step 0, 9 | §2.8 I3 (`renderer._preprocess_svg_data`) | pending | yes (deterministic) | | IMP-19 I4 zone 비중 분배 (reference — see [`IMP-19-ZONE-RATIO-REFERENCE.md`](IMP-19-ZONE-RATIO-REFERENCE.md)) | Step 8 | §2.8 I4 (`renderer._group_blocks_by_area`) | pending | yes (deterministic) | -| H2 frame contract validation | Step 10 | §2.7 H2 (`content_verifier.verify_structure` pattern) | pending | yes (deterministic) | +| IMP-20 H2 frame contract validation (reference — see [`IMP-20-FRAME-CONTRACT-VALIDATION-REFERENCE.md`](IMP-20-FRAME-CONTRACT-VALIDATION-REFERENCE.md)) | Step 10 | §2.7 H2 (`content_verifier.verify_structure` pattern) | pending | yes (deterministic) | --- diff --git a/docs/architecture/PHASE-Z-IMPLEMENTATION-ISSUE-BACKLOG.md b/docs/architecture/PHASE-Z-IMPLEMENTATION-ISSUE-BACKLOG.md index 567bd40..c48ad67 100644 --- a/docs/architecture/PHASE-Z-IMPLEMENTATION-ISSUE-BACKLOG.md +++ b/docs/architecture/PHASE-Z-IMPLEMENTATION-ISSUE-BACKLOG.md @@ -68,7 +68,7 @@ | **IMP-17** | **AI repair fallback infra** (**carve-out — normal path 밖**) | Step 12, 16, 17 | §3 G3 | (별 axis priority — pending) | [carve-out boundary + activation gate](IMP-17-CARVE-OUT.md) (3-cond AND: User GO ∧ B4 frame_selection evidence ∧ IMP-04/05 live — full def in u2 doc) — `httpx` + SSE streaming + retry + JSON parse pattern reference — light_edit / restructure proposal | **normal path AI 호출 0 — 본 axis = fallback only, normal path 와 분리 설계** / Kei persona 단절 (Phase Q 자산과 단절) | soft link: IMP-04 + IMP-05 (catalog 확장 + V4 fallback 활성 시 의미) | documented (deferred) | | IMP-18 | I3 SVG 좌표 보강 | Step 0, 9 | §3 Reference Only | ↓ low | `renderer._preprocess_svg_data` 패턴 reference — frame_partials SVG 좌표 사전 박힘 — [gap report](IMP-18-SVG-GAP-REPORT.md) | Phase R' (renderer.py) 회귀 X | soft link: IMP-04 (frame_partials 등록 후 의미 ↑) | documented | | IMP-19 | I4 zone 비중 분배 | Step 8 | §3 Reference Only | ↓ low | `renderer._group_blocks_by_area` 패턴 reference — zone-level ratio 분배 — [reference doc](IMP-19-ZONE-RATIO-REFERENCE.md) | Phase O 컨테이너 회귀 X / 직접 통합 X | soft link: IMP-09 (zone 비중 분배 영역 공유) | documented | -| IMP-20 | H2 frame contract validation | Step 10 | §3 Reference Only | ↓ low | `content_verifier.verify_structure` pattern reference — Phase Z frame contract 검증 pattern | Phase Q `REQUIRED_PATTERNS` 값 회귀 X / Phase Z 자체 pattern dict 설계 | soft link: IMP-04 (확장 catalog 적용 시 검증 범위 확대) | pending | +| IMP-20 | H2 frame contract validation | Step 10 | §3 Reference Only | ↓ low | `content_verifier.verify_structure` pattern reference — Phase Z frame contract 검증 pattern — [reference doc](IMP-20-FRAME-CONTRACT-VALIDATION-REFERENCE.md) | Phase Q `REQUIRED_PATTERNS` 값 회귀 X / Phase Z 자체 pattern dict 설계 | soft link: IMP-04 (확장 catalog 적용 시 검증 범위 확대) | documented | > **IMP-15 child issues note (#45–#49)** — IMP-15 (Step 14 visual_check 보강) is the parent row; child sub-axes were tracked as separate Gitea issues and are not given standalone backlog rows. Children: #45 (e9b3d2e), #46 (2827622), #47 (535c484), #48 (614c533), #49 (verification-only). Per INTEGRATION-AUDIT-01 §10.3 footnote option to avoid double-counting under IMP-15. diff --git a/docs/architecture/PHASE-Z-PIPELINE-STATUS-BOARD.md b/docs/architecture/PHASE-Z-PIPELINE-STATUS-BOARD.md index b90f0ce..dd11a3c 100644 --- a/docs/architecture/PHASE-Z-PIPELINE-STATUS-BOARD.md +++ b/docs/architecture/PHASE-Z-PIPELINE-STATUS-BOARD.md @@ -46,7 +46,7 @@ Step 0 은 본체가 아닌 *준비 조건*. Step 1 (MDX 업로드) 부터가 ru | A | 7 | Slide-Level Layout Planning | ⚠ partial (count-based / 7-A catalog + 7-B candidate fn 추가, runtime 호출처 X) | | A | 8 | Zone + Internal Region Ratio Planning | ⚠ partial (zone-level horizontal-2 만 dynamic / 8-A region+display catalog + 8-B-1/2 candidate fn 추가, runtime 호출처 X / region-level 은 B2 안 partial) | | A | 9 | Region-Level Frame / Display Selection | ⚠ partial (B4 가 catalog cover + declaration order 로 frame 선택 분담 / V4 evidence 미통합 / Step 5 와 conflate 잔존) | -| A | 10 | Frame Contract 확인 | ⚠ partial (B3 의 accepted_content_types + sub_zones 선언 추가 — B4 만 읽음, mapper 미읽음 / density envelope 별 axis) | +| A | 10 | Frame Contract 확인 | ⚠ partial (B3 의 accepted_content_types + sub_zones 선언 추가 — B4 만 읽음, mapper 미읽음 / density envelope 별 axis) — IMP-20 ref: [reference doc](IMP-20-FRAME-CONTRACT-VALIDATION-REFERENCE.md) | | A | 11 | Content Unit / Child Group → Internal Region → Frame Slot Mapping | ⚠ partial (B4 v0 dormant 2-stage + region 1:1 sub_zone + narrowest first + trace-only runtime 호출, render path 미연결) | | A | 12 | Slot Payload 생성 | ✅ (deterministic) | | B | 13 | Render | ✅ |