Files
C.E.L_Slide_test2/docs/architecture/IMP-16-U2-WIRING-DESIGN.md
kyeongmin 23ba8b68cd feat(IMP-16): U1 H3 verification utility port + U2 wiring design
U1 (runtime, u1-u10): new Phase Z-owned deterministic verification module
src/phase_z2_verification_utils.py (335 LOC, stdlib only) porting H3 utility
surface — VerificationResult, extract_text_from_html, normalize_for_comparison,
extract_keywords, strip_meta_lines, split_into_sentences, verify_text_preservation,
detect_invented_text. 10 unit tests under tests/phase_z2/test_pz2_vu_*.py (56 tests).

u11 (design-only): docs/architecture/IMP-16-U2-WIRING-DESIGN.md fixes the Step
1/2/14/21/22 reverse-path contract, redesigned frame-contract pattern
reservation (IMP-20), and IMP-07 hard-gate criteria. No runtime wiring lands
in this commit — U2 stays blocked until IMP-07 reverse path is implemented +
verified + runtime-hit.

Guardrails: no src.content_verifier import; no FORBIDDEN_KEI_MEMOS /
generate_with_retry / REQUIRED_PATTERNS / verify_structure / verify_area /
verify_all_areas usage; no AI / Kei / httpx / SSE path; AI-isolation contract
upheld (utility is deterministic).

Tests: 56 targeted PASS (0.19s), 15 regression baseline PASS (7.59s).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-19 04:42:35 +09:00

7.8 KiB

IMP-16-U2 — Phase Z verification wiring design (design-only)

Status: design-only contract. No runtime wiring lands in this issue. All wiring is gated behind IMP-07 reverse-path activation (B-2 main). When IMP-07 lands, this doc becomes the binding contract for the Step 1 / 2 / 14 / 21 / 22 changes that consume the IMP-16-U1 surface in src/phase_z2_verification_utils.py.

Source anchors

  • IMP-16 backlog row — docs/architecture/PHASE-Z-IMPLEMENTATION-ISSUE-BACKLOG.md:67 (priority ↓ low, hard link IMP-07, source §3 H3 Reference Only).
  • IMP-07 backlog row — same doc line 51 (status pending).
  • 22-step pipeline anchor — PHASE-Z-PIPELINE-OVERVIEW.md Steps 1 / 2 / 14 / 21 / 22.
  • U1 module — src/phase_z2_verification_utils.py (u1~u10 ports).
  • Phase Q reference H3 (Reference Only — do not import) — src/content_verifier.py.

Gate (hard block — do not merge wiring before this clears)

  • IMP-07 status MUST be implemented and verified before any code change listed below lands.
  • Repo grep html_to_slide_mdx | edited_html_to_mdx | reverse_path MUST return at least one runtime hit in a non-test module under src/.
  • The reverse-path entry point MUST emit (a) a normalized re-entry MDX string and (b) the upstream generated HTML string, both as deterministic outputs accessible to Step 2 and Step 14 callers.

Per-step wiring contract

Step 1 — MDX upload (re-entered MDX validation)

  • Caller : the reverse-path adapter introduced by IMP-07, immediately after it produces a re-entry MDX.
  • Surface used : u6 split_into_sentences (validate that the reverse-path MDX yields at least one sentence after meta-strip + bullet-marker strip).
  • Behavior : if split_into_sentences(reentry_mdx) returns an empty list, the reverse-path adapter MUST raise a deterministic input error before Step 2 starts. No silent fallback. No AI call. No content rewrite.
  • Trace : debug.json["step01"]["reentry_sentence_count"] (additive integer field).

Step 2 — MDX normalize (text preservation cross-check)

  • Caller : parse_mdx / align_sections_to_v4_granularity post-normalize hook (added only when the input came through the IMP-07 reverse path; original-upload path is unchanged).
  • Surface used : u8 verify_text_preservation(reentry_mdx, upstream_generated_html, area_name="reentry_mdx_vs_upstream_html").
  • Threshold : the U1 module default (_TEXT_PRESERVATION_DEFAULT_THRESHOLD = 0.70, ported verbatim from Phase Q). Do not redesign in U2.
  • Behavior : VerificationResult.passed == False → adapter aborts the re-entry with the result's errors list surfaced; auto pipeline does NOT silently continue. Per feedback_auto_pipeline_first, no review_required / review_queue is inserted — adapter abort is the deterministic outcome.
  • Trace : debug.json["step02"]["reentry_text_preservation"] = {passed, score, area_name, missing_count} (additive; missing sentences themselves NOT serialised, per privacy-by-default).

Step 14 — Selenium visual runtime check (invented-text guard)

  • Caller : the run_overflow_check post-render path, ONLY when the run was triggered from the reverse-path re-entry. Original-upload path keeps current Step 14 behavior unchanged (this is NOT an enhancement of Step 14 image/table coverage — that axis belongs to IMP-15).
  • Surface used : u9 detect_invented_text(reentry_mdx, final_html) against the just-rendered final.html.
  • Behavior : the returned list[str] is purely telemetry. It does NOT change render outcome and does NOT change compute_slide_status (Step 20). The reverse-path may consult the list to decide whether to surface a warning at Step 22 — but auto pipeline does not gate on it (per feedback_auto_pipeline_first + AI-isolation contract).
  • Trace : debug.json["step14"]["reentry_invented_text_fragments"] = list[str] (additive; already truncated by u9's _INVENTED_TEXT_TRUNCATE_LEN = 80).

Step 21 — Debug / trace recording (additive only)

  • Surface used : none (Step 21 consumes the additive fields written by Step 1 / 2 / 14 above).
  • Behavior : write_debug_json MUST treat the new fields as additive — no rename, no removal, no schema regression of existing keys. Missing fields (original-upload path) MUST be absent rather than null, so downstream consumers can distinguish "original upload" from "reverse-path re-entry".
  • Trace contract : the three additive fields above + a single new flag debug.json["pipeline"]["reverse_path_reentry"] = bool (the only schema field that gates the existence of the other three).

Step 22 — User confirmation / export (surface, no AI)

  • Surface used : none directly (Step 22 is UI scope, currently CLI-only — see PHASE-Z-PIPELINE-OVERVIEW Step 22).
  • Behavior contract for whoever lands Step 22 UI : Step 22 MAY render the additive Step 2 / Step 14 fields read-only. No write-back. No AI call. No content rewrite.

Redesigned frame-contract pattern dict (reserved, NOT delivered in U2)

  • Phase Q REQUIRED_PATTERNS (Phase Q reference: src/content_verifier.py:382) is body_bg / core / sidebar / footer — these are Phase Q area names, not Phase Z entities. Values are NOT reused.
  • Phase Z replacement will be keyed on (frame_id, frame_slot_id) per the canonical hierarchy Slide → Zone → Internal Region → Frame → Frame Slot → Content (PHASE-Z-PIPELINE-OVERVIEW.md §Operating Principles), and will be sourced from templates/phase_z2/catalog/frame_contracts.yaml (Step 0 / Step 10).
  • Out of scope for IMP-16-U2. This belongs to IMP-20 (H2 frame contract validation — same backlog doc line 71). U2 must not ship a pattern dict; U2 must not import or wrap Phase Q verify_structure / verify_area / verify_all_areas.

Guardrails (binding)

  • AI isolation contract — all wiring above is deterministic. No LLM / Kei / httpx / SSE call on any path. (per feedback_ai_isolation_contract + PZ-1: AI=0 normal.)
  • No-hardcoding — U2 ports the algorithm. The only literal values reused are the Phase Q H3 thresholds already lifted to named constants in u7 / u8 / u9. No sample-specific value (MDX 03 / 04 / 05) enters U2.
  • No src.content_verifier import — under any condition. The U1 module is the sole Phase Z surface.
  • No FORBIDDEN_KEI_MEMOS / generate_with_retry port — these are H4 / H5 archive markers and remain out of scope.
  • Schema additive only — debug.json keys listed above are new; no existing key is renamed, removed, or repurposed. (per feedback_artifact_status_naming — final.html is not the same axis as preservation / invented-text telemetry.)
  • Spacing direction — N/A for this axis (this is verification, not layout). No common CSS / padding / tolerance shrinking is introduced.
  • Status semantics — Step 20 compute_slide_status is NOT changed by U2. Preservation / invented-text fields are telemetry; they do not flip PASSRENDERED_WITH_VISUAL_REGRESSION on their own.

Rollback

  • All changes are additive: the Step 1 input-error path, the Step 2 post-normalize hook, the Step 14 telemetry call, the four new debug.json keys.
  • Rollback = revert the IMP-07 reverse-path entry's call sites; no schema migration needed because the four debug.json keys are gated on pipeline.reverse_path_reentry.

Open items deferred until IMP-07 lands

  • Exact module path of the IMP-07 reverse-path adapter (TBD by IMP-07).
  • Whether Step 2's preservation cross-check needs a per-section variant or only a whole-MDX variant — depends on whether IMP-07 emits a single re-entry MDX or per-section MDX fragments.
  • Whether Step 14's invented-text telemetry should be emitted per area_name or only once globally — depends on whether IMP-07's reverse-path produces area-tagged HTML.

These are NOT resolved here. They are resolved at IMP-07 land time, in a follow-up update to this doc.