70 Commits

Author SHA1 Message Date
97b7833a1b docs(#95): IMP-95 u11 status-board markers + idempotence/regex tests (docs+test only)
Some checks failed
Multi-MDX Regression (IMP-91) / multi-mdx-regression (push) Failing after 20s
- Add section 9 to PHASE-Z-PIPELINE-STATUS-BOARD.md carving section 3 item (j)
  into 8 IMP-95 sub-axes (j1-j8). j1-j5 = trace-only, j6-j8 = guarded.
- Marker grammar: <!-- IMP-95:<axis> -->VALUE<!-- /IMP-95 --> (distinct from
  IMP-91 grammar so scripts/update_status_board.py MARKER_RE cannot rewrite
  IMP-95 cells).
- Allowed value enum: {pending, trace-only, guarded, active}.
- tests/scripts/test_update_status_board.py: +1 import, +4 module-level
  constants, +3 test functions verifying marker presence/count (8), value
  domain enum, IMP-91 updater isolation against IMP-95 cells, and IMP-95
  regex rewrite idempotence. IMP-91 tests untouched.
- No production-code touched. Default-OFF flag posture preserved; all cells
  trace-only or guarded.
2026-05-27 18:18:53 +09:00
6e9e3ee1fb fix(#94): IMP-94 u7 regression-harness SHA parity normalization for additive Layer A markers
Strip the two additive IMP-94 attributes (data-region-id,
data-content-unit-id) symmetrically at both the 89-a fixture capture
script and the b4 mapper source SHA parity test before SHA-256 hashing,
honoring the issue body guardrail "mdx 01-05 의 final.html SHA =
byte-equivalent except for new data-* attrs" without recapturing the
pre-89-a baseline. The strip regex is anchored on the leading-space +
attr-token shape emitted by src/region_marker_stamper.py:131-135 so the
#96 data-frame-slot-id axis stays disjoint.

The marker-parity cross-axis tests for emergency_p4b_verbatim_code and
emergency_p4_ai_inline append sites are converted from pytest.skip to
vacuous-truth early return when the Emergency P4/P4b anchors are absent
in HEAD — the assertion target does not exist in IMP-94 scope, but the
contract still locks placement_markers=[] when the Emergency axis lands
later. Refreshed 89a_pre_baseline_sha.json (2026-05-27T04:19:30Z) holds
the normalized sizes/SHAs for mdx 01-05 post-stamper.

Scope: regression harness + fixture only; zero src/ edits. Verified
35/35 marker-parity + 18/18 SHA parity in a clean detached worktree at
HEAD 2afedfc with these four files applied.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-27 14:09:26 +09:00
5484077a53 feat(#94): IMP-94 u1~u6 Layer A region/content marker injection (stamper + render_slide chain + 4 zones_data.append placement_markers + 35 parity tests)
Some checks failed
Multi-MDX Regression (IMP-91) / multi-mdx-regression (push) Failing after 21s
u1 (src/region_marker_stamper.py): deterministic root-div stamper injecting data-region-id + data-content-unit-id onto each family-partial root div anchored by data-template-id. Idempotent (re-stamp = no-op), AI=0, additive only, empty/None markers no-op, F9/F29 frame-slot axis preserved.

u2 (src/phase_z2_pipeline.py render_slide chain): _stamp_region_markers chained after IMP-56 u9 _stamp_zone_html. Marker source = zone.get("placement_markers") or [] — Codex #16 P4b crash risk closed via the or-[] call-site fallback.

u3 (_derive_placement_markers helper): projects PlacementPlan.slot_assignments[] → list[dict] carrying region_id + content_unit_id + frame_slot_id (frame_slot_id reserved for #96 89-d). Live B4 path emits at primary zones_data.append.

u4 (3 non-live zones_data.append defaults): placement_markers: [] at IMP-30 u4 empty-shell, IMP-86 u1 adapter_needed, post-loop unrenderable plan-record paths — uniform zone shape, stamper no-op surface.

u5/u6 (tests/test_phase_z2_imp94_marker_parity.py): 33 hard tests + 2 cross-axis skip-if-anchor-absent (Emergency P4/P4b future axis). Coverage: 13 family-partial root anchors, F29 + F9 frame-slot preservation, idempotence, live render_slide stamping, P4b empty-marker no-crash, MDX 01 strip-attr parity, trace-to-DOM parity.

Disjoint from #96 (data-frame-slot-id) by attribute name. SPEC anchor: docs/architecture/PHASE-Z-CONTENT-OBJECT-SUBZONE-SPEC.md §6.4 + §7.2 (Layer A read targets + render-path activation).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-27 08:15:08 +09:00
b9747c2f4a feat(#84): IMP-84 u1~u3 silent automation policy enforcement (FramePanel reject confirm + slide_base provisional badge/outline + IMP-30 visual assertions inverted)
Some checks failed
Multi-MDX Regression (IMP-91) / multi-mdx-regression (push) Failing after 21s
- u1 FramePanel.tsx: extract `applyFrameSelection(candidate, onFrameSelect)`
  pure helper; collapse `handleFrameSelect` to direct onFrameSelect for every
  V4 label; drop `window.confirm` reject popup (IMP-47B u11 regression noise
  per `feedback_auto_pipeline_first`). New vitest pin `imp84_framepanel_reject_silent.test.ts`
  covers helper invocation across all 4 V4 labels + source-presence pins.
- u2 templates/phase_z2/slide_base.html: delete `.zone--provisional` CSS,
  `.zone__needs-adaptation-badge` CSS, the zone--provisional class fragment
  in the zone div, and the badge `<span>` render at the provisional zone.
  Preserve `data-provisional="1"` attribute as silent telemetry. New pytest
  `tests/phase_z2/test_imp84_provisional_silent_render.py` pins the silent
  contract independently of the IMP-30 first-render file.
- u3 tests/test_phase_z2_imp30_first_render.py: invert the three IMP-30 u5
  positive provisional-visual assertions to IMP-84 silent-contract negatives
  (no class, no badge, no CSS selectors); preserve positive `data-provisional`
  telemetry assertions. Docstrings updated to IMP-84 silent contract.

Out of scope (Round #4 + #92 contract): Home.tsx `toast.error(aiReviewMsg)`
call line, designAgentApi.ts `api_error_kinds`/`api_error_kind` schema and
operational-only formatter, FramePanel reject badge/tooltip read-only labels
(L102/L147/L156), and backend `zone.provisional` flag emission.

Stage 4 PASS: u1 vitest 10/10, u2 pytest 5/5, u3 pytest 29/29 (incl. 3
IMP-84 inverted assertions: `test_imp84_provisional_zone_silent_no_class_no_badge`,
`test_imp84_provisional_badge_never_rendered_in_mixed_zones`,
`test_imp84_slide_base_css_strips_provisional_visual_selectors`).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-26 14:15:02 +09:00
4da22adb43 feat(#90): IMP-56 u1-u19 catch-up before final close (post-u20 push fix)
Some checks failed
Multi-MDX Regression (IMP-91) / multi-mdx-regression (push) Failing after 20s
u1: text_overrides axis in user_overrides_io
u2: structure_overrides axis in user_overrides_io
u3: vite allowlist for new endpoints
u4: text_override_resolver
u5: Step 12 text_overrides apply in phase_z2_pipeline
u6: structure_override_resolver
u7: text_path_stamper
u8: SlideCanvas text-edit capture
u9: SlideCanvas structure-edit overlay
u10: userOverridesApi service extension
u11: designAgent types extension
u12: slidePlanUtils restore
u13: user_overrides endpoint tests
u14: user_overrides restore tests
u15: pipeline fallback tests
u16: edit-mode state + gating tests
u17: slide_base print mode CSS
u18: /api/connect endpoint (vite)
u19: /api/export endpoint (vite)

Recovery scope: 29 files (12 modified + 17 new). u20 already pushed in
9439575; this commit lands u1-u19 that were authored but not committed
before #90 was externally closed.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-26 06:12:13 +09:00
ec7471ed59 docs(#1): IMP-01 A-6 u1~u5 zone_geometries_px runtime verification log (driver chain + 4-topology runs + schema lock + no-drift guardrail + pytest baseline gate; production source untouched, impl at 1dc81e0)
Some checks failed
Multi-MDX Regression (IMP-91) / multi-mdx-regression (push) Failing after 20s
2026-05-25 15:49:23 +09:00
4e281a20d8 feat(#93): IMP-55 u1~u12 frontend manual section swap detection (manual_section_assignment bool axis + drag-only marker gate + dual-axis persistence + backend manual-true gate)
Some checks failed
Multi-MDX Regression (IMP-91) / multi-mdx-regression (push) Failing after 9s
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-25 08:27:09 +09:00
9062931863 feat(#74): IMP-45 u1~u8 slide-level CSS override (frontmatter slide_overrides.css + --override-slide-css/--slide-css-file + idempotent Step 13 injector)
Some checks failed
Multi-MDX Regression (IMP-91) / multi-mdx-regression (push) Failing after 22s
u1 KNOWN_AXES tuple gains slide_css entry in src/user_overrides_io.py
(snake_case parity with image_overrides); round-trip test extends
to 6 axes.
u2 src/mdx_normalizer.py surfaces nested slide_overrides.css from the
MDX frontmatter into the normalize_mdx_content return dict; absent
key -> {}, non-string css drops. 4 unit cases in tests/test_mdx_normalizer.py
(present / absent / non-string / title-only).
u3 src/slide_css_injector.py NEW (88 lines) mirrors the
inject_image_overrides_style contract from src/image_id_stamper.py:
marker pair <!--IMP45-SLIDE-CSS:OPEN--> / <!--IMP45-SLIDE-CSS:CLOSE-->,
idempotent re-injection, </head> > <body> > document-start three-tier
fallback, empty/None -> unchanged. 8 fixtures in
tests/test_slide_css_injector.py mirror test_image_id_stamper.py.
u4 run_phase_z2_mvp1 accepts override_slide_css: Optional[str] = None;
None -> frontmatter slide_overrides.css fallback. Step 13 calls
inject_slide_css after image override injection and before the
final.html disk write, so CLI/CI/regression renders observe the same
backend artifact.
u5 argparse adds mutually-exclusive --override-slide-css TEXT (inline
CSS, <style> wrapper optional) and --slide-css-file PATH (UTF-8 read,
fail-closed sys.exit(2) on missing path / decode error / both flags
present). Resolved string is forwarded as override_slide_css kwarg.
6 cases in tests/test_phase_z2_cli_overrides.py (inline / file / both
/ missing / non-utf8 / neither).
u6 samples/mdx_batch/04.mdx frontmatter gains slide_overrides.css
block (verbatim of the former MDX04_DEFAULT_OVERRIDE_CSS constant,
no sample/frame gate). Subprocess smoke in
tests/test_phase_z2_slide_css_smoke.py verifies the marker pair and
CSS substring land in final.html.
u7 Front/client removes the sample/frame-gated frontend-only injection:
Home.tsx drops the MDX04_DEFAULT_OVERRIDE_CSS constant and the
sample==="04"+frame==="process_product_two_way" branch (-28 lines);
SlideCanvas.tsx drops the iframe contentDocument.head injection of
that prop (-14 lines). Live preview now reads backend final.html only.
u8 tests/regression/fixtures/89a_pre_baseline_sha.json 04.mdx entry
resyncs to the live SHA ddb6bf2f... / 28042 bytes (overwrites the
earlier 5-byte-drift d02c76fd... / 28047). Other entries untouched.
Note: 01.mdx baseline drift (ad6f16a3... / 29089 -> live f26a7fac...
/ 29084) predates this branch and is split to a follow-up issue per
the closed-issue fresh validation rule.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-25 03:26:03 +09:00
b4be6c1cd0 feat(#72): IMP-43 u1~u8 --reuse-from incremental rerun (Step 0/1/2/5/6 reuse + Step 7+ re-execute)
Some checks failed
Multi-MDX Regression (IMP-91) / multi-mdx-regression (push) Failing after 25s
u1 argparse --reuse-from PREV_RUN_ID + post-merge fail-closed guard (rejects
layout/zone_geometry/zone_section/image override axes by name; only
--override-frame is preserved).
u2 src/phase_z2_reuse_snapshot.py — JSON-only Step 6 snapshot with mdx_sha256
integrity key and {value, source_path, upstream_step} provenance per axis
(pickle forbidden per Stage 2 guardrail).
u3 _write_reuse_snapshot at the Step 6 boundary; soft-fails to stderr without
aborting the seed run.
u4 prev_run_dir RO copy of step00/01/02/05/06 + _reuse_snapshot.json into
new run_dir, state rehydration, reuse marker, frame-override application on
restored units, Step 7+ resume.
u4b fail-closed for missing prev_run_dir / missing/corrupt/invalid snapshot /
mdx_sha256 mismatch / accidental new==prev write, with value+path+upstream
diagnostics per axis.
u5 reuse_from Optional[str] threaded through run_phase_z2_mvp1 signature and
CLI dispatch; default None preserves byte-identical pre-IMP-43 behavior.
u6 Front /api/run optional reuseFromRunId forwarding (vite.config.ts +
designAgentApi.ts + run_pipeline_reuse_from.test.ts).
u7a fast CI equivalence (1 mdx × 1 layout × 2 frames); step13 whitelist =
run_id/timestamps/prev_run_id only. u7b 3 layouts × 3 mdx × 32 frames
sweep gated by pytest.mark.sweep (registered in pyproject.toml; default CI
must use -m 'not sweep').
u8 scripts/measure_reuse_savings.py argv-driven A/B/C harness with frame
pin self-discovery + seed-time exclusion; status board §8 TBD anchor
(issue-body 50-70% / 10-20s→3-8s claim explicitly unverified, not mirrored).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-24 22:44:27 +09:00
8648a468d9 feat(#69): IMP-40 u1~u6 frame contract label_default placeholder/fallback role discriminator (BIM/DX leak fix)
Some checks failed
Multi-MDX Regression (IMP-91) / multi-mdx-regression (push) Failing after 26s
- catalog (frame_contracts.yaml): F18 bim_dx_comparison_table col_a/col_b
  label_default_role=placeholder; F30 industry_current_status_three_col +
  F31 industry_characteristics_three_col col_a/col_b/col_c forward-compat
  placeholder; F33 engn_sw_three_types untouched (no label_default).
- mapper (_build_compare_table_2col): generic _resolve_label_default(col_key)
  branches on <col>_label_default_role — placeholder -> '' (Figma placeholder
  suppressed at runtime), fallback -> catalog literal (legacy default), unknown
  -> ValueError with template_id + role_key + value. Absent role defaults to
  fallback (backward compat for contracts without discriminator).
- tests (tests/phase_z2/test_imp40_label_default_role.py): u4 generic matrix
  (placeholder / fallback / absent / unknown / 3-col axis) + u5 F18-reuse
  non-BIM/DX synthetic rows asserting placeholder labels emit '' and BIM/DX
  literal tokens do not leak.
- snapshot (tests/integration/__snapshots__/slot_payload.json): mdx 01 F18
  string_slot_nonempty.col_a_label/col_b_label True -> False (u6 expected
  drift from u3 placeholder -> empty string flip). slot_names + rows + title
  preserved.

Verification:
- imp40_label_default_role: 6/6 PASSED
- phase_z2 sweep: 608/608 PASSED
- multi_mdx_regression: 50/50 PASSED
- cross-suite sweep: 662/662 PASSED
- BIM/DX literal grep on mapper + new test: 0 hits
- No mdx-specific branches (mdx 03/04/05 grep on mapper: 0 hits)

Guardrails: no MDX 03/04/05 hardcoding (catalog policy only); no spacing
shrink; no auto frame swap on reject; no AI call at Step 12; F33 untouched.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-24 18:53:20 +09:00
028042aaa9 feat(#68): IMP-39 u1~u8 ranking_sort_policy single-source + backend↔frontend label-priority mirror
Some checks failed
Multi-MDX Regression (IMP-91) / multi-mdx-regression (push) Failing after 23s
u1: templates/phase_z2/catalog/ranking_sort_policy.yaml — single-source policy
    (label_priority asc {use_as_is:0, light_edit:1, restructure:2, reject:3}
    + confidence desc + v4_rank asc tie-break).
u2: src/phase_z2_pipeline.py — apply_ranking_sort helper + lookup_v4_match_with_fallback
    applies policy AFTER IMP-38 raw-window selection (raw default_window + usable_count
    preserved on RAW all_judgments).
u3: src/phase_z2_pipeline.py — _build_application_plan_unit forwards ranking_sort_policy
    + sorted_candidate_evidence into Step 9 payload.
u4: Front/client/src/services/designAgentApi.ts — frame_candidates builder reads
    unit.sorted_candidate_evidence + unit.ranking_sort_policy first; local LABEL_PRIORITY
    retained only on warn-fallback path.
u5: tests/test_ranking_sort_policy.py — pure permutation coverage (sample-agnostic).
u6: tests/phase_z2/test_label_priority_synthetic.py + fixtures/ranking_sort_policy/
    synthetic_divergence.yaml — low-conf use_as_is behind high-conf restructure.
u7: tests/phase_z2/test_imp39_mdx04_env_toggle_e2e.py — samples/mdx_batch/04.mdx with
    AI_FALLBACK_ENABLED=off; backend selected_v4_rank == frontend frame_candidates[0].
u8: tests/phase_z2/test_imp39_corpus_audit.py — real corpus sweep over
    tests/matching/v4_full32_result.yaml (10 MDX sections); section IDs loaded
    dynamically (RULE 0 / RULE 7 sample-agnostic).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-24 17:12:07 +09:00
2e3747c5ab feat(#88): IMP-88 u1~u7 Step 17 retry chain — layout_adjust + image_fit + frame_internal_fit_candidate executors + dispatcher + entry
Some checks failed
Multi-MDX Regression (IMP-91) / multi-mdx-regression (push) Failing after 23s
Step 17 salvage dispatcher previously only ran the 3 actions in
_SALVAGE_FAIL_BY_ACTION (cross_zone_redistribute / glue_compression /
font_step_compression). Any next_proposed_action outside that set hit
salvage_terminal_action and dropped through, so visual_check aborted on
layout_adjust / image_fit / frame_internal_fit_candidate cascades.

u1 — router data surface (src/phase_z2_router.py)
  - ACTION_BY_CATEGORY: image_aspect_mismatch -> image_fit (new row),
    frame_capacity_mismatch -> frame_internal_fit_candidate (was
    frame_reselect).
  - ACTION_IMPLEMENTATION_STATUS: layout_adjust / image_fit /
    frame_internal_fit_candidate flipped MISSING -> IMPLEMENTED with
    inline IMP-88 rationale.

u2 — failure_router cascade surface (src/phase_z2_failure_router.py)
  - FAILURE_TYPE_DESCRIPTIONS + SALVAGE_FAILURE_TYPE_BY_ACTION extended
    with layout_adjust_insufficient / image_fit_insufficient /
    frame_internal_fit_insufficient producers.
  - NEXT_ACTION_BY_FAILURE + NEXT_ACTION_RATIONALE +
    NEXT_ACTION_IMPLEMENTATION_STATUS rows added; cascade chain becomes
    font_step_compression -> layout_adjust -> frame_internal_fit_candidate
    -> frame_reselect -> details_popup_escalation (#64 terminal).

u3~u5 — planners + apply helpers (src/phase_z2_retry.py)
  - plan_layout_adjust / apply_layout_adjust_layout_css with
    _layout_swap_priority across 8-preset LAYOUT_PRESETS (preset switch,
    no shared-margin shrink per Phase Z spacing direction).
  - plan_image_fit / apply_image_fit_css scoped to frame slot using
    existing classifier image_event payload (object-fit + max-w/h
    derivation).
  - plan_frame_internal_fit_candidate / apply_frame_internal_fit_candidate_css
    stays inside declared frame contract envelope; emits infeasible path
    when envelope is absent.

u6~u7 — pipeline wiring (src/phase_z2_pipeline.py)
  - _SALVAGE_FAIL_BY_ACTION extended; _attempt_salvage_chain gains
    layout_adjust distinct-render branch + frame_internal_fit_candidate
    CSS-overlay branch + loop cap.
  - _attempt_step17_image_fit_single_pass added for image_fit entry.
  - §11.7.1 / §11.7.2 entry triggers wired; Step 17/18/19 artifact
    refresh + note logging closes the salvage_terminal_action fall-through
    for the 3 IMP-88 actions.

Tests
  - New: test_router_actions_imp88.py (12),
    test_failure_router_imp88_cascade.py (12),
    test_phase_z2_retry_layout_adjust.py (10),
    test_phase_z2_retry_image_fit.py (13),
    test_phase_z2_retry_frame_internal_fit.py (13),
    test_phase_z2_pipeline_salvage_imp88.py (8),
    test_phase_z2_pipeline_step17_entry_imp88.py.
  - Regression-aligned: test_phase_z2_failure_router_cascade.py,
    test_phase_z2_step17_salvage_chain.py — pre-existing cascade +
    salvage-chain assertions updated to the IMPLEMENTED surface.

Out of scope (separate axes / issues)
  - details_popup_escalation terminal body (#64).
  - frame_reselect MISSING flip (different axis).
  - Step 14/16 detection refinement.
  - Stage 0 mdx_normalizer integration (locked 2026-05-08).
  - AI fallback activation.

Guardrails respected
  - Phase Z spacing direction: layout_adjust switches preset; no shared
    margin shrink.
  - AI isolation contract: planners + dispatcher are deterministic; zero
    AI calls in u1~u7.
  - No hardcoding: routing + cascade live in router/failure_router data
    rows, not inline conditionals.
  - IMP-46 (#62) cache carve-out: untouched.
  - 1 commit = 1 decision unit: u1~u7 grouped as a single IMP-88 unit.

Stage 4 verification: 7 IMP-88 test files + 2 modified regression files
PASS (Claude #12 + Codex #12 consensus YES). Full-suite sweep deferred to
a separate step.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-24 15:01:55 +09:00
e0c39f1bc1 feat(#73): IMP-44 u1~u5 layout override unknown-key guard + frontend zone_geometries validation
Some checks failed
Multi-MDX Regression (IMP-91) / multi-mdx-regression (push) Failing after 23s
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-24 12:12:24 +09:00
5deeb97cf6 feat(#71): IMP-42 u1~u5 silent fail chain diagnostics (assert + invalid-char detector + DIAG log)
Some checks failed
Multi-MDX Regression (IMP-91) / multi-mdx-regression (push) Failing after 24s
Stage 4 binding scope — diagnostic-only, fail-loud, sample-agnostic
(RULE 0 / AI-isolation contract). No production behavior change beyond
fail-loud raises on previously-silent failure classes.

u1 src/phase_z2_pipeline.py:2747-2772 — render_slide precondition assert
   (template_id non-empty str + slot_payload dict), placed after the
   `__empty__` short-circuit at 2740 to preserve empty-zone grid behavior.
u2 src/phase_z2_pipeline.py:2681-2710 — _scan_rendered_html_for_invalid_path_chars
   helper covering src / href / url(...) values for backslash, &amp;, &#39;.
   Invoked on partial render (2778) and slide_base assembly (2798).
u3 src/phase_z2_pipeline.py:2638-2676,2733,5509 — _emit_diag_zones_shape
   shape-only [DIAG] JSON at Step 12 slot_payload emit and Step 13
   render_slide entry. No env gate — silence is the bug.
u4 Front/client/src/pages/Home.tsx:388-392 — unconditional [DIAG raw overrides]
   console.log on handleGenerate boundary, after flushUserOverrides() and
   immediately before runPipeline.
u5 tests/phase_z2/test_phase_z2_diag_smoke_general.py — 32-frame general
   smoke driven by load_frame_contracts() registry (not literal MDX 03/04/05),
   parametrizes u1/u2/u3 across the full frame_contracts.yaml top-level.

Tests (Stage 4 verification PASS):
- u1 8 passed, u2 14 passed, u3 12 passed, u4 5 passed, u5 97 passed.
- Backend full regression tests/phase_z2/ 499 passed in 110.84s.
- Frontend full regression 182 passed in 1.10s.

Out of scope (separate axes):
- Path normalization / as_posix migration.
- Autoescape policy change.
- build_layout_css refactor (Stage 1 category-error rejection).
- Recovery / auto-fix on detected invalid path.
- MDX content / frame-selection / zone-composition change.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-24 08:28:54 +09:00
c59864eb9a feat(#91): IMP-91 u2~u15 multi-mdx regression CI suite + status-board auto-update
Some checks failed
Multi-MDX Regression (IMP-91) / multi-mdx-regression (push) Failing after 31s
- u2~u5: tests/integration/test_multi_mdx_regression.py — MDX_SET=(01..05)
  cached integration runs + status/structural/visual snapshots +
  full_mdx_coverage assertion (9 snapshots populated for 01-05).
- u6~u11: F0 normalize / F1 V4 ranking / F2 slot_payload /
  F3 classifier-only AI / F4 layout / F5 final.html axis per MDX_SET.
- u12: pyproject.toml — pytest-json-report>=1.5 in dev extras.
- u13: .github/workflows/multi-mdx-regression.yml — pytest+artifact CI.
- u14: scripts/update_status_board.py + tests/scripts/test_update_status_board.py
  — idempotent JSON marker updater (3 unit tests pass).
- u15: PHASE-Z-PIPELINE-STATUS-BOARD.md — 30 F0-F5 × mdx01-05 markers
  initialized `?` + workflow wiring.

Stage 4 verify: 59/59 PASS targeted (smoke 6 + updater 3 + integration 50),
386/386 PASS regression umbrella, 0 failures.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-24 07:01:58 +09:00
6aa7564509 feat(#91): IMP-91 u1 non-VP subprocess smoke mdx01/02 parametrize
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-24 02:18:17 +09:00
b1bbe27c38 feat(#89): IMP-89 89-a u1~u5 Layer A render path activation (B4→mapper source-of-truth switch, default-OFF flag)
PHASE_Z_B4_MAPPER_SOURCE env flag (default OFF) switches slot_payload
source-of-truth from legacy mapper-only / V4 rank-1 to B4 PlacementPlan
.selected_template_id at the single switch site in the runtime loop.
OFF preserves final.html SHA byte-equivalence (u4 parity guard, mdx 01-05).
ON requires Layer A render-active path; BLOCKED exits on B4 no-cover
and on B4-selected FitError (IMP-87 honesty gate pattern — NO silent
fallback). Distinct from PHASE_Z_B4_GATEKEEPER (mismatch render-skip).

Units (1 commit = 1 axis per Stage 1 scope_lock):
  u1 — _b4_mapper_source_enabled() flag reader (default OFF)
  u2 — _select_mapper_template_id() selector wired at the switch site
  u3 — _b4_mapper_source_blocked_exit() for b4_no_cover / b4_selected_fit_error
  u4 — render SHA parity regression (tests/regression/ baseline mdx 01-05)
  u5 — slot_payload byte-equivalence (matches_mapper=True axis, mdx 01-05)

Targeted 89-a suite 63 PASS; Phase Z regression 323 PASS; IMP-87 mirror
20 PASS. Demo activation via .env only (no vite.config hardcoding).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-24 00:33:28 +09:00
896f273ffa feat(#92): IMP-92 u1~u5 AI fallback config validation (model ping + operational error classification)
Replaces #84 UI-noise removal plan with positive operational-alert contract.
Five-axis stack lands together: (1) default model literal moved to current
Opus-family ID, (2) Anthropic SDK error classifier mapping exceptions to
quota/billing/auth/other, (3) api_error_kind plumbed through ai_repair_status
summary + per-record retention, (4) Step 0 preflight ping gated under
ai_fallback_enabled (default OFF preserved) with fail-fast on invalid
model/key, (5) frontend formatter rewritten to surface only operational
quota/billing/auth toasts (non-operational paths return null per
feedback_auto_pipeline_first silent-pipeline policy).

u1 - default model literal claude-opus-4-6-20250415 -> claude-opus-4-7
     (src/config.py + tests/test_phase_z2_ai_fallback_config.py lock mirror)
u2 - classify_operational_error type+status_code dispatch + Step 12
     api_error_kind stamp on except path (src/phase_z2_ai_fallback/client.py
     + src/phase_z2_ai_fallback/step12.py + tests/phase_z2_ai_fallback/test_step12.py)
u3 - _summarize_ai_repair_status aggregates api_error_kinds {quota,billing,
     auth,other}; error_records[i].api_error_kind retained per-record
     (src/phase_z2_pipeline.py + tests/test_imp47b_failure_surface.py)
u4 - _run_step0_ai_preflight + Step0PreflightError; preflight only fires
     when ai_fallback_enabled=true; one-token ping; invalid key/model =>
     setup failure before Step 1 (src/phase_z2_pipeline.py +
     tests/phase_z2/test_pipeline_step0_preflight.py NEW)
u5 - AiRepairStatus.api_error_kinds? interface + formatAiRepairHumanReview
     Message rewritten: operational quota/billing/auth -> Korean copy
     verbatim from issue body (tie-break quota -> billing -> auth);
     validation/coverage_violated/unsupported_kind/generic-other/legacy
     payload -> null (Front/client/src/services/designAgentApi.ts +
     Front/client/tests/imp47b_human_review_toast.test.tsx)

Guardrails respected:
- feedback_demo_env_toggle_policy: default OFF preserved; preflight skipped
  when ai_fallback_enabled=false (test_preflight_skipped_when_disabled
  asserts anthropic.Anthropic() not called).
- feedback_auto_pipeline_first: non-operational AI failures stay silent;
  only quota/billing/auth reach user toast.
- feedback_ai_isolation_contract: AI remains fallback-only; no normal-path
  migration; MDX preserved.
- project_imp46_carveout_caveat: cache_key/fingerprints fields untouched on
  every record; no overlap with #62 cache region.
- feedback_no_hardcoding: zero MDX-sample-specific literals; classifier
  dispatch by SDK type, not by string parsing.
- feedback_artifact_status_naming: operational toast scoped to alert axis,
  not overall PASS signal.

Tests:
- Targeted u1+u2+u3+u4: 63 passed
- u5 vitest (Front/): 10/10 passed
- tests/phase_z2_ai_fallback dir regression: 240 passed
- tests/phase_z2 dir regression: 323 passed
- IMP-92-adjacent (-k "imp47b or ai_fallback or preflight or step12 or step0"): 299 passed (808 deselected)
- u1 baseline lock (test_client_mock.py): 8 passed
Zero failures, zero regressions outside scope.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-23 22:07:25 +09:00
842a46144c feat(#87): IMP-87 u1~u5 empty_shell honesty gate + BLOCKED exit
EMPTY_SHELL_NO_CONTENT overall enum + 3-marker detection (frame_template_id="__empty__"
OR label="empty_shell" OR merge_type="empty_shell") routes empty-placeholder-only
slides to BLOCKED CLI exit 1 + red final_status.html, blocking fake PASS reports
(feedback_artifact_status_naming). Coverage accounting split: legacy covered_section_ids
preserved + new content_rendered_section_ids / empty_shell_section_ids. mdx05 Case B
(zero V4 evidence) honestly classified instead of synthesizing fabricated rank-1 reject
frames. IMP-30 u6/u7 stale empty-shell PASS assertions inverted (29 tests). IMP-85 smoke
parametrize: mdx05 removed from exit-0 list + dedicated BLOCKED exit test added (4 tests).
No production behavior change for chain_exhausted Case A; no AI route activation; no
mdx-id hardcoding. 53 targeted + 76 adjacent Phase Z tests PASS.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-23 20:40:54 +09:00
c53722ad0b feat(#86): IMP-86 u1~u5 placeholder zones_data + invariant guard
Mapper FitError handler now appends a __empty__ placeholder to zones_data
and a matching debug_zone so the surviving cardinality stays in sync with
the active layout preset's grid rows. A pre-build_layout_css invariant
guard fails fast with preset/positions/count diagnostics if drift recurs.
Per-record telemetry (adapter_needed, mapper_fit_error, provisional) is
exposed on both placeholder records; authoritative slide_status.adapter_
needed_units schema is unchanged.

Closes mdx03 reject override regression: Step 12 AI router now reachable
without heights_px ValueError; default-path behavior unaffected.

u1 — FitError placeholder zones_data + debug_zone (src/phase_z2_pipeline.py)
u2 — pre-build_layout_css invariant guard (src/phase_z2_pipeline.py)
u3 — horizontal-2 normal+placeholder helper unit (test_compute_per_zone_geometry.py)
u4 — mdx03 reject override → Step 12 integration + default regression
u5 — placeholder telemetry surface (adapter_needed/mapper_fit_error/provisional)

Tests:
- u3 helper: 7 passed (0.06s)
- u4+u5 integration: 2 passed (7.87s)
- Phase Z2 + AI fallback regression: 544 passed (66.28s)
- Broader sweep (excl. matching/pipeline heavy): 1066 passed (96.12s)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-23 18:25:14 +09:00
cacc5b30db feat(#85): IMP catalog builder invariant + VP runtime gate (u1~u7)
- u1: BuilderMissingError(FitError) — narrow exception aligned with pipeline catch
- u2: load_frame_contracts catalog invariant + VP skip + CatalogInvariantError
- u3a: audit CLI I1~I3 (partial existence / declared builder / registry membership)
- u3b: audit CLI I4 (slot_payload refs vs declared/generated payload keys)
- u4: lookup_v4_candidates VP filter (lookup_v4_all_judgments raw telemetry untouched)
- u5: catalog invariant regression coverage + temp non-VP failure fixtures
- u6: mdx04 VP routing fixture tests (sw_dependency_four_problems excluded from live)
- u7: tests/conftest.py env isolation + mdx03/mdx04/mdx05 subprocess smoke

Targeted 74 PASS (12.31s). Full regression 1063 PASS (87.70s). Audit CLI clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-23 16:56:38 +09:00
d9d338416a feat(#62): IMP-46 cache fingerprint forwarding u1~u4 (router kwarg + step12 forward + 8 scenarios) 2026-05-23 08:53:22 +09:00
f3ef4d917c feat(#64): IMP-35 details_popup_escalation u1~u10 + Stage 3 R7 anchor re-pin
Land the production + test surface for the Step 17 cascade POPUP terminal
(DETERMINISTIC -> POPUP -> AI_REPAIR -> USER_OVERRIDE) per Stage 2 plan R2.
u11 (baseline-red invariance gate) was already landed in 7c93031 ahead of
this commit; this commit completes u1~u10 plus the Stage 3 R7 follow-up
anchor re-pin for test_imp17_comment_anchor.py.

Implementation units (Stage 2 R2 contract):
  u1  frame_reselect_insufficient failure_type + post-frame remeasure (q4)
        - src/phase_z2_failure_router.py, src/phase_z2_pipeline.py
  u2  NEXT_ACTION_BY_FAILURE row + impl_status flip
        - src/phase_z2_failure_router.py
  u3  Router details_popup_escalation MISSING->IMPLEMENTED + executor stub
        - src/phase_z2_router.py
  u4  step17.py AI split-decision contract (POPUP cascade_stage +
      route_for_label + skip_reason); API gated
        - src/phase_z2_ai_fallback/step17.py
  u5  Step 17 POPUP gate executor; popup_escalation_plan + has_popup marker
        - src/phase_z2_pipeline.py, src/phase_z2_ai_fallback/step17.py
  u6  Composition popup binding -- yaml strategy -> zone payload
        - src/phase_z2_composition.py
  u7  Pipeline composer -> render_slide wiring
      (popup_html / preview_text / has_popup)
        - src/phase_z2_pipeline.py
  u8  slide_base.html <details>/<summary> popup wrapper
        - templates/phase_z2/slide_base.html
  u9  display_strategies.yaml inline_preview + popup metadata
        - templates/phase_z2/regions/display_strategies.yaml
  u10 MDX preservation invariant: popup=full source / body=summary or subset
        (asserted by tests/phase_z2/test_popup_mdx_preservation.py)
  u11 (already in 7c93031) -- baseline-red invariance gate

Stage 3 R7 follow-up (anchor re-pin, test-only):
  - tests/orchestrator_unit/test_imp17_comment_anchor.py
    Pre-anchor additions in src/phase_z2_pipeline.py (u1 / u5 / u7) shifted
    the restructure/reject route-hint comments 578/579 -> 586/587. Re-pinned
    the two guard tests (and docstring re-pin lineage 564 -> 570 -> 578 ->
    586). Production code untouched.

Verification (Stage 4 R1):
  pytest -q tests/orchestrator_unit/test_imp17_comment_anchor.py
    -> 2 passed / 0.02s
  pytest -q <10 IMP-35 unit files in tests/phase_z2 + tests/phase_z2_ai_fallback>
    -> 136 passed / 15.94s
  Baseline-red invariance gate
    (tests/test_imp47b_step12_ai_wiring.py +
     tests/test_phase_z2_ai_fallback_config.py)
    -> 4 failed / 6 passed; FAILED set === IMP35_BASELINE_RED_NODE_IDS
    (frozen registry from 7c93031). Contract holds.
  Codex Stage 4 R1 = YES (independent verify).

Guardrails honored:
  - MDX content preservation: popup carries full source, body holds
    summary or subset only (CLAUDE.md 자세히보기 원칙;
    feedback_phase_z_spacing_direction -- capacity expanded, no margin shrink).
  - AI isolation contract: Step 17 POPUP gate is deterministic; AI hook
    surface is split-decision contract only, API call gated.
  - No hardcoding: escalation thresholds derived from existing overflow
    detector outputs; preview_chars deterministic from container px.
  - 1 commit = 1 decision unit: u1~u10 land together as the planned
    production surface; u11 was deliberately split into 7c93031 as Stage 3
    R7 carve-out, and the R7 anchor re-pin rides with this commit because
    it is the direct shift consequence of the u1/u5/u7 pre-anchor additions.
  - Scope-locked: .claude/settings.json explicitly excluded
    (Stage 4 exit report contract).

Out of scope (per Stage 1 + Stage 2):
  - AI_REPAIR API activation (post IMP-35 axis).
  - IMP-34 zone resize, IMP-36 responsive fit (chain partners,
    separate issues).
  - Print-time auto-expand JavaScript for <details>.
  - Popup escalation in stages other than Step 17.
  - Baseline-red body repair (4 frozen failures) -- separate follow-up
    issue; u11 only guards the count.
  - frame_reselect algorithm changes (entry point only).
  - templates/phase_z2/slide_base.html path rename.

source_comment_ids:
  Stage 1: claude_stage1_problem_review_imp35, codex_stage1_verification_imp35_yes
  Stage 2: Claude #4 R2 plan, Codex #5 R2 YES
  Stage 3: Claude #86 (R7 anchor re-pin), Codex #87 YES
  Stage 4: Claude #88 R1, Codex #89 R1 YES

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-23 07:36:57 +09:00
7c93031f9b feat(#64): IMP-35 details_popup_escalation u11 baseline-red invariance gate
Add a test-only invariance gate that locks the pre-existing four-test red
baseline so IMP-35 cannot silently grow the red surface while in-flight.
u11 does NOT fix the four reds — Stage 2 follow_up_candidates tracks the
actual repair as a separate issue. u1~u10 production work remains in the
worktree and is explicitly out of this commit per Stage 3 R7 carve-out.

Frozen registry (IMP35_BASELINE_RED_NODE_IDS, set semantics):
  1. tests/test_imp47b_step12_ai_wiring.py
       ::test_mixed_units_classified_by_route_and_provisional_flag
  2. tests/test_imp47b_step12_ai_wiring.py
       ::test_reject_provisional_unit_reaches_router_short_circuit
  3. tests/test_imp47b_step12_ai_wiring.py
       ::test_step12_ai_repair_artifact_writes_json_serialisable_records
  4. tests/test_phase_z2_ai_fallback_config.py
       ::test_ai_fallback_master_flag_default_off

Gate semantics (subprocess pytest, set comparison):
  - All 4 node ids resolve to collectible pytest items
    (rename / delete is caught up front).
  - Broader baseline-area sweep across the two registry files yields
    EXACTLY 4 FAILED and 0 ERROR, with FAILED set ≡ registry.
  - A new red in the baseline area flips count above 4 OR introduces a
    FAILED id outside the registry; either branch fails the gate.
  - Cross-lock test ensures registry node ids cannot point outside the
    declared area-files inventory.

AI isolation contract (feedback_ai_isolation_contract):
  Gate body uses stdlib only (subprocess + re + ast). An AST self-verify
  test rejects `anthropic` imports and `route_ai_fallback` references in
  this file, structurally preventing AI routing inside the gate.

Stage 4 verification (HEAD c1df656 pre-commit):
  pytest -q tests/phase_z2/test_imp35_baseline_red_invariance.py
    → 7 passed in 15.26s.
  Baseline area sweep
    (tests/test_imp47b_step12_ai_wiring.py +
     tests/test_phase_z2_ai_fallback_config.py)
    → 4 failed / 6 passed / 0 errors; FAILED set ≡ registry (identity).
  pytest --collect-only on the 4 registered node ids → all 4 resolve.
  py_compile clean. Codex R1 = YES (independent verify).

Guardrails honored:
  - Scope-locked: test-only file; zero production code in this commit.
  - 1 commit = 1 decision unit (u11 only).
  - No hardcoding: registry = Stage 2 contract frozen tuple, not
    sample-specific literal; gate body has zero magic constants.
  - AI isolation: stdlib-only gate, AST self-verify locks isolation.
  - baseline-red 4 body repair = separate follow-up issue, not u11 scope.

source_comment_ids: Stage 1 problem-review; Stage 2 plan R2 + Codex R2
YES; Stage 3 Claude #30 + Codex #31 R7 YES; Stage 4 Claude #32 + Codex
#33 R1 YES.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-23 04:13:54 +09:00
c1df656312 feat(#65): IMP-36 fit/rotation generalization (u1~u8)
Generalize Phase Z frame partial responsive fit / rotation to four canonical
F13/F14/F20/F8 family partials. Surface = 13 canonical partials; 19
builder-only contracts remain explicitly out of scope.

u1  test_imp17_comment_anchor: re-pin L570->L578 (restructure+IMP-17),
    L571->L579 (IMP-29 -> IMP-47B supersession). Stage 1 red baseline gate.
u2  frame_contracts.yaml: add rotation_eligible (P1) + body_fit_pattern2 (P2)
    bool axes on 13 partial-backed contracts. P1 True: F13/F14/F20/F8 (4).
    P2 True: F23 + P1_set (5). F29 columns[1].body_parser column_plain ->
    column_with_transform (P3 parity).
u3  test_imp36_fit_rotation_generalization (NEW, 166 lines): static
    parametrized assertions for P1 metadata + CQ presence, P1 opt-out
    absence, P2 --max-body-lines + clamp + cqh, P2 opt-out absence, 19
    builder-only exclusion.
u4  three_parallel_requirements (F13): introduce f13b-root container-name +
    container-type:size + @container (aspect-ratio<1.5) rotation;
    add inline --max-body-lines + body line-height clamp/cqh/calc.
u5  three_persona_benefits (F14): f14b-root P1 + P2 cqh/jinja body fit.
    Persona colors (#285b4a/#445a2f/#743002) and circle SVG aspect 1/1
    preserved.
u6  dx_sw_necessity_three_perspectives (F20): f20b-root P1 + P2 cqh/jinja
    body fit under IMP-49 partial-fidelity lock.
u7  info_management_what_how_when (F8): f8b-root P1 + P2 cqh/jinja body fit.
u8  test_imp36_overflow_chain_self_fire (NEW, 299 lines): Selenium self-fire
    harness for F13/F14/F20/F8 at aspect 1.78 vs 1.0. Asserts line-height
    changes, font-size invariance across all 4 frames (no per-frame exempt),
    grid columns rotate 3 -> 1, OVERFLOW_CASCADE_ORDER remains 4-tuple.

Stage 4 verification (HEAD 6f1c736 pre-commit baseline):
  u1 2/2 PASS, u3 33/33 PASS, u8 9/9 PASS (live Chrome).
  Regression sweep tests/phase_z2 + tests/orchestrator_unit 335/335 PASS.
  font-size mutations introduced: 0.
  Pre-existing red (test_imp47b_step12_ai_wiring x3, ai_fallback_master_flag
  default_off x1) verified unchanged via stash swap -> not introduced.

Guardrails honored:
  - cqh / clamp / container query only (no shared margin/padding/gap shrink).
  - font-size invariant under aspect change (P2 mutates line-height +
    --max-body-lines only).
  - No cross-frame .fNb__ class borrowing (IMP-49 partial-fidelity lock).
  - F14 circle SVG aspect 1/1 untouched; persona colors preserved.
  - AI isolation: no HTML structure generation; AI calls remain zone-content.
  - 1 turn = 1 step; commit excludes .claude/settings.json and all
    out-of-scope untracked worktree per Stage 4 binding contract.

source_comment_ids: Stage 1 #13/#14; Stage 2 #21/#22; Stage 3 #4 + Codex #4
YES; Stage 4 Claude #1 + Codex #3 PASS.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-23 01:18:20 +09:00
6f1c7367e0 feat(#79): IMP-51 image_overrides axis (u1~u11 backend stamp+CLI+CSS inject + frontend drag/resize+persistence + tests) 2026-05-22 21:54:38 +09:00
9388e25e76 feat(#80): IMP-52 user_overrides.json persistence (u1~u10 backend + frontend + tests)
4-axis MDX-stem keyed persistence so layout / zone_geometries / zone_sections / frames
survive across `/api/run` sessions. Auto-restore on MDX reopen; CLI > file precedence
on backend pipeline entry; 300ms-debounced PUT flushed before Generate.

u1 src/user_overrides_io.py — load/save/validate_key (MDX-stem regex), 4-axis schema,
  miss={}, corrupt warning+{}, atomic tmp+rename, foreign-key preserve.
u2 src/phase_z2_pipeline.py — post-argparse fallback fills only missing axes.
u3 Front/vite.config.ts — GET /api/user-overrides/:key (200 {} on miss, 400 traversal).
u4 Front/vite.config.ts — PUT /api/user-overrides/:key, 4-axis allowlist, partial merge.
u5 Front/client/src/services/userOverridesApi.ts — typed get/save + flushUserOverrides
  with 300ms debounce and mutated-axis partial payloads.
u6 Front/client/src/pages/Home.tsx + slidePlanUtils.ts — restore on MDX upload (non-frame
  axes immediately, frames remapped post-loadRun unit_id → region.id).
u7 Home.tsx — persist on 4 mutation handlers (section drop, layout select, zone resize,
  frame select); zone_sizes and Generate excluded.
u8 tests/test_user_overrides_io.py — round-trip, unknown-key passthrough, missing/corrupt,
  invalid keys (26 tests).
u9 tests/test_user_overrides_pipeline_fallback.py — per-axis fill, CLI-wins, no-file noop,
  corrupt warning+skip (16 tests).
u10 Home.tsx + user_overrides_write.test.ts — await flushUserOverrides() before runPipeline
  in handleGenerate try-block head; source-pattern regression assertions (20 → 22 tests).

Backend pytest 42/42 green. Frontend vitest 113/113 green (endpoint 42 / restore 21 /
service 28 / write 22). HEAD baseline ee97f4f; no spillover to phase_z2 templates /
families / frames / pipeline orchestration outside the IMP-52 surface.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 11:47:11 +09:00
ee97f4fc78 feat(#77): IMP-48 composition planner re-split on all-reject (u1~u9)
Add resplit_all_reject_merges() helper in phase_z2_composition.py that
detects parent_merged / parent_merged_inferred units with label=reject
and rebuilds them as per-section single units using each section's own
rank-1 V4 evidence (no frame swap, MDX raw_content preserved).

Pipeline hook fires once after Step 6 settling chain (u12/u4/empty-shell)
and section_assignment_plan resolution, before Step 6 artifact write.
Guards: beneficial-split rule (>=1 non-reject), coverage equality, layout
cap (>4 abort), max_retry=1, section_assignment_override short-circuit.

Audit: comp_debug["imp48_resplit"] additive payload (applied, split_units,
skipped_units, post_split_unit_count, post_split_layout_preset);
selection_path="resplit_from_merge" telemetry on rebuilt singles;
layout_preset re-derived via select_layout_preset(new_units).

Tests: 39/39 PASS (composition u1~u6: 14 cases; pipeline u7~u9: 25 cases).
Scoped regression 720/6 with 6 failures isolated as pre-existing on
baseline 79f9ea5 (independent of IMP-48). mdx03 golden lock preserved.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 05:00:07 +09:00
79f9ea5c92 feat(#78): IMP-49 dx_sw_necessity partial Figma provenance fix (u1~u3)
Replace eyeballed PROMOTED green hex (#296B55, #123328) with verbatim
upstream values from figma_to_html_agent/blocks/1171281198/index.html:
- border + check mark: #1d4d3e (upstream :208 -webkit-text-stroke)
- header gradient: rgb(15, 50, 30) / rgb(60, 52, 34) (upstream :54, :64)

Document .f20b__* as authoring-ordinal namespace (NOT Figma frame_id
1171281198); structural link via data-frame-id attribute. No selector
rename, no catalog edit.

Add focused regression test (tests/test_imp49_partial_figma_provenance.py)
extracting <style>-block hex/rgb/rgba literals and asserting non-whitelisted
literals exist byte-identically in upstream source. Whitelist limited to
neutrals (#fff, #1a1a1a) + shared zone-title token (#000, #883700,
rgba(50,44,30,0.4)).

Scope: dx_sw_necessity_three_perspectives.html only. 19 missing partials,
.fNb__ rename, full 32-contract audit deferred to follow-up axes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 02:49:43 +09:00
1186ad8ae2 feat(#76): IMP-47B reject-as-AI-adaptation activation (u1~u13 backend + tests)
- u1~u9: AI fallback infrastructure (router/prompts/schema/validator) + Step 12 hook
- u10: e2e reject chain (writes final.html with AI-repaired slot, full coverage)
- u11: frontend wiring deferred to follow-up commit (split from IMP-41 hunks)
- u12: coverage_invariant guard
- u13: cache save gate (visual_check PASS + user_approved/auto_cache) — Codex #22 verified

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-22 00:19:10 +09:00
90503cadd6 feat(#67): IMP-38 V4 max_rank policy formalization (u1~u3, 4 round consensus)
- u1: separate templates/phase_z2/catalog/v4_fallback_policy.yaml + load_v4_fallback_policy() loader
  (catalog pollution prevention — Codex #1 correction)
- u2: dynamic effective max_rank in lookup_v4_match_with_fallback (3-variable ceiling min,
  Codex #2 correction: min(configured, len(judgments_full32))) + 3-tier usable predicate
  (status + catalog + optional capacity) + trace 8 fields (requested/default/configured_extended/
  judgments_count/effective_extended_ceiling/effective_max_rank/usable_count/policy_applied)
- u3: 2 production call site cleanup (max_rank=3 removed, HEAD baseline) + tracked
  Front/vite.config.ts PHASE_Z_MAX_RANK env retired + 4 regression scenarios

verified: 32 passed (IMP-38 focused scope) — IMP-05 L4 dedup / L2 schema preserved,
IMP-30 allow_provisional byte-identical, caller_override backward compat (tests)

Stage cycle (#67, 7 round Claude + 5 round Codex):
- Stage 1: Claude #1 -> Codex #1 YES + 5 corrections
- Stage 2 r1+r2: Claude #2-#4 -> Codex #2 Q2 -> Codex #3 YES (4 round consensus LOCK 23195)
- Stage 3 U1+U2+U3: Claude #5-#9 -> Codex #6 NO 4to3 correction -> Codex #7 YES -> Codex #8 YES
- Stage 4: Claude #11 -> Codex #9 (anchor attribution nuance) -> Codex #10 readiness -> Codex #11

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-21 22:14:05 +09:00
dceb10129f feat(#63): IMP-34 R1 donor capacity measured bound (u1+u2)
Bound donor capacity in plan_zone_ratio_retry by min(static_slack,
max(0, clientHeight-scrollHeight)) when both Step 14 measured fields
are present; fall back to static contract slack when absent. Prevents
the donor from being over-allocated when full-but-not-overflowing,
avoiding a wasted Selenium rerender before cascade falls to
cross_zone_redistribute.

- src/phase_z2_retry.py: planner block L122-157 only; donor filter
  (L107-112), slack<=0 gate, base_plan, greedy aggregation untouched.
  Adds measured_empty_px + slack_bound_source telemetry to
  donor_candidates_considered (additive only).
- tests/phase_z2/test_phase_z2_retry_measured_bound.py: 5-axis
  regression (static_fallback / measured<static / measured>=static /
  measured==0 excludes / filter+bool guard).

Guardrails honored: V4 rank-1 frame lock preserved, no frame_swap,
no spacing/padding/gap/line-height/font shrink, no content drop,
no MDX 03/04/05 branching, no Step 14 schema mutation. Static
fallback idempotent when measured fields absent.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-21 21:37:41 +09:00
a06dd3d4b0 feat(#42): IMP-04b catalog extension to 32 frames (u1~u24)
Extends frame_contracts.yaml from 11 to 32 contracts to match V4 evidence
(tests/matching/v4_full32_result.yaml unique template_ids), closing the
IMP-04b gap surfaced in IMP-04 (#4) Track A milestone.

Scope (Stage 2 24-unit plan):
- u3/u4: WIP partial absorb — app_sw_package_vs_solution (F23),
  pre_construction_model_info_stacked (F9). Both promoted from
  _WIP_FILES.md to frame_contracts.yaml. WIP allowlist now empty.
- u5~u11: Track A 7 frames (index.html present, contract missing).
- u12~u23: Track B 12 frames (visual_pending: true; family partial
  authoring deferred — contract-first per Stage 2 plan).
- u24: BT closure gate. Adds
  test_imp04b_closure_gate_v4_coverage_and_wip_empty (catalog ↔ V4
  set-equal + WIP==0) and test_vp_exempt_keys_are_contracted_and_disk_absent
  (vp ∩ disk == ∅). Relaxes test_contracts_set_equals_disk_families_minus_wip
  to (disk - wip) ∪ vp. 32 derived from V4 evidence YAML (no hardcoding).

Closure facts (locked):
  contracts = 32, v4_unique = 32, missing = [], extra = [],
  wip_count = 0, vp_count = 19, vp ∩ disk = [].

Guardrails honored:
- No calculate_fit migration.
- No AI/Kei API call in per-frame work.
- No 1-2 sample hardcoding (Codex #7 generalization guardrail).
- No production refactor for tests (IMP-32 owns helper extract).
- figma_to_html / V4 / Phase Z 3-layer separation preserved.
- 1 commit = 1 IMP-04b decision unit (bundled u1~u24 per Stage 2
  plan; CAT+WIP atomicity for u3/u4 preserved).

Tests: tests/test_family_contract_baseline.py 4/4 PASS.
Cross-ref: IMP-04 (#4), IMP-29 (#38), IMP-30 (#39), IMP-31 (#40),
IMP-32 (#41), IMP-33 (#61), IMP-47A (#75).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-21 19:39:16 +09:00
15ef7c65e9 fix(#75): IMP-47A mdx03 frontend execution stabilization (u1~u4)
u1: SlideCanvas iframe sandbox += allow-scripts (allow-same-origin preserved)
    → embedded-mode script in slide_base.html now applies html.embedded
    → standalone CSS reset deactivates inside iframe; no clipping
u2: designAgentApi.loadRun merges candidate_evidence + v4_all_judgments
    + v4_candidates via Map<template_id|id|frame_id> dedup,
    LABEL_PRIORITY (use_as_is<light_edit<restructure<reject) then
    confidence desc, capped TOP_N_FRAMES=6
u3: Home.handleGenerate useCallback deps = [uploadedFile, slidePlan,
    userSelection, pendingZones, pendingLayout] (5-tuple, stale-closure fix)
u4: tests/manual/imp47a_e2e.md — mdx03 manual e2e spec (5 axes)

Frontend-only. Backend src/ untouched. No template/catalog edits.
Determinism preserved (no LLM in frontend merge logic).
Baseline: pytest -q tests → 623 passed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-21 14:56:56 +09:00
c864fe0479 feat(#61): IMP-33 AI fallback scaffolding (u1~u11, flag default OFF)
Frame-aware AI fallback module scaffolded under src/phase_z2_ai_fallback/
with master flag ai_fallback_enabled=False; normal-path AI call count
remains 0. AI output constrained to builder_options_patch /
partial_overrides / slot_mapping_proposal; MDX / frame_id / raw HTML /
raw CSS mutations rejected at schema layer. IMP-46 cache gate (cache.py)
raises AiFallbackCacheGateError unless visual_check_passed AND
user_approved. Step 12 wires AI repair after IMP-30 provisional payload
only; Step 17 stays blocked behind IMP-34 / IMP-35 prerequisites.
AST isolation guard forbids fallback package from importing Phase Q /
Kei / pipeline runtime symbols. Docs IMP-17 / IMP-31 bound to runtime
module surface via 11-row structural test pin (test_docs_sync.py) so
drift fails CI.

Tests: 116 fallback / 161 phase_z2 regression / 526 scoped full sweep
all passing. Existing pre-IMP-33 fixture issue in scripts/test_phase_t_*
remains untouched (out of scope).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-21 12:46:49 +09:00
c412f1ea75 refactor(#41): IMP-32 Step 9 application_plan helper extraction (u1~u5)
Pure refactor — extract inline Step 9 per-unit application_plan dict
assembly into module-level private helpers for testability. Replaces
IMP-05 Case 7 inspect.getsource() literal guard with direct helper-call
shape test. Behavior preserved: key set/order, candidate_evidence +
fallback_chain compat alias identity, IMP-06 additive plan fields,
IMP-11 D-2 markers (single _contract = get_contract(c.template_id)
bind + catalog_registered + min_height_px chain).

- u1 _application_candidates_for_unit(unit) at src/phase_z2_pipeline.py
  :2829-2853 — APPLICATION_MODE_BY_V4_LABEL mapping (pure extraction)
- u2 _v4_all_judgments_for_unit(v4_all_for_unit) at :2855-2882 —
  IMP-11 D-2 chain preserved literally
- u3 _build_application_plan_unit(unit, zone_plan, selection_trace,
  plan_record, v4_all_for_unit, layout_preset, layout_candidates_list)
  at :2885-2995 — byte-identical per-unit dict (key set + order +
  value identity), candidate_evidence / fallback_chain compat alias,
  v4_candidates list, v4_all_judgments, application_candidates, IMP-06
  additive plan fields
- u4 Step 9 inline loop body at :4620-4658 replaced with helper call;
  per-index/per-id lookups (zone_region_plans[i], v4_fallback_traces
  .get(...), plan_record_by_unit_id.get(id(unit)), section_alias_by_id,
  lookup_v4_all_judgments(...)) stay at call-site
- u5 tests/test_phase_z2_v4_fallback.py Case 7 rewritten to
  test_build_application_plan_unit_emits_candidate_evidence_and_alias
  — direct helper call with SimpleNamespace duck-typed input; asserts
  candidate_evidence list identity (is), fallback_chain compat-alias
  identity (is), key order (candidate_evidence before fallback_chain),
  and compat-alias comment scoped to inspect.getsource(_build_
  application_plan_unit)

Verification: targeted 22 passed, full pytest 408 passed (0 fail/skip),
smoke 11/11 PASS (2 pre-existing baseline SKIPs unchanged).

Cross-ref: IMP-05 (#5) commit 23d1b25 Case 7 temporary source guard
(replaced) / Codex #20 + #21 / IMP-11 D-2 marker preserved.
2026-05-21 03:17:27 +09:00
1efbf672bd feat(#39): IMP-30 first-render invariant + abort bypass (2 paths)
Restore first-render invariant: final.html + Step 20 slide_status MUST be
written for every input where Step 0~5 succeed. Two abort paths replaced
with provisional/empty-shell synthesis; MDX content preserved, AI-free.

- u1 V4Match.provisional + lookup_v4_match_with_fallback(allow_provisional)
  chain_exhausted -> synthesize rank-1 provisional (opt-in, default-off)
- u2 CompositionUnit.provisional propagation (single / parent_merged /
  parent_merged_inferred constructors)
- u3 select_composition_units(allow_provisional_fill=True) last-resort
  fill + _candidate_state="selected_provisional"
- u4 pipeline.py path-(a) abort guard replaced with provisional retry +
  terminal __empty__ shell (no sys.exit(1))
- u5 zones_data.provisional -> slide_base.html zone--provisional class +
  data-provisional + needs-adaptation badge (template-only)
- u6 compute_slide_status additive provisional_first_render_count/_units
  (overall enum unchanged per IMP-05 Codex #10 D4)
- u7 regression: tests/test_phase_z2_imp30_first_render.py (28 tests) +
  tests/test_phase_z2_v4_fallback.py (+5 cases)

Guardrails verified: MVP1_ALLOWED_STATUSES unchanged, no calculate_fit,
no LLM in fallback path, no MDX 03/04/05 hardcoding.

Anchor sync (Rule 13): tests/orchestrator_unit/test_imp17_comment_anchor.py
re-pinned 564/565 -> 570/571 to track V4Match.provisional shift at
src/phase_z2_pipeline.py:179-184.

Cross-ref: IMP-05 (#5) §5 defer + Codex #2 first-render invariant.
2026-05-21 00:40:58 +09:00
265d70ed91 refactor(#28): IMP-28 L4 _parse_json dedup (4 modules -> src/json_utils)
Consolidate duplicate _parse_json helpers from content_editor.py /
design_director.py / kei_client.py (fuller form) and pipeline.py (simple form)
into shared src/json_utils.parse_json (strict superset). All 18 call-sites
preserved via `parse_json as _parse_json` alias import; no behavior change.

- src/json_utils.py (new): shared helper, fenced/plain-fence/bare-brace patterns
  + list-prefix cleanup fallback.
- tests/test_json_utils.py (new): 9 unit tests pinning parser semantics.
- src/content_editor.py / design_director.py: remove local helper +
  unused `import json` / `import re`.
- src/kei_client.py / pipeline.py: remove local helper; `json` / `re` retained
  (used elsewhere).

Targeted tests 9 passed; full pytest 374 passed (3 pre-existing scripts/
collection errors reproduce on baseline 909bf75, IMP-28 unrelated).
2026-05-20 20:44:19 +09:00
909bf75edc refactor(#27): IMP-27 K5 catalog loader + _get_block_by_id cleanup
Consolidate three duplicated catalog readers and two _get_block_by_id
implementations behind a single shared module (src/catalog.py) that owns
file-read + mtime cache. All caller signatures and return contracts
remain byte-identical.

Units:
- u1 NEW src/catalog.py (76 lines): load_root_catalog / load_blocks /
  get_block_by_id / get_catalog_mtime as the sole file-read +
  mtime-cache owner.
- u2 src/block_reference.py: _load_catalog delegates to load_blocks
  (list[dict] preserved); _get_block_by_id (no-arg) delegates to
  catalog.get_block_by_id. Module-level _catalog_cache removed.
- u3 src/block_selector.py: load_catalog delegates to load_root_catalog
  (root dict preserved); _get_block_by_id (catalog-injected sig
  preserved) delegates to catalog.get_block_by_id. Module-level
  _catalog_cache / _catalog_mtime / CATALOG_PATH removed.
- u4 src/renderer.py: _load_catalog_map and
  _load_catalog_map_with_variants consume catalog.load_blocks; renderer
  projection caches kept local but keyed via
  catalog.get_catalog_mtime(). Per-projection invalidation keys
  (_CATALOG_MAP_MTIME / _CATALOG_VARIANT_MAP_MTIME) introduced. import
  yaml, CATALOG_PATH, legacy _CATALOG_MTIME removed.
- tests NEW tests/test_catalog_shared_loader.py (421 lines, 23 cases):
  shared loader + 3 wrappers covering single file-read, contract
  preservation, signature preservation, shared cache, private state
  absence, mtime invalidation propagation to renderer projections.

Verification:
- pytest tests/test_catalog_shared_loader.py -v: 23/23 PASS in 0.13s.
- pytest tests/ -q --ignore=tests/matching: 365/365 PASS in 38.10s.
- src/fit_verifier.py, src/space_allocator.py, src/pipeline.py and
  templates/catalog.yaml unchanged (git diff empty).

Out of scope:
- catalog.yaml schema/path unchanged.
- Catalog direct-read call sites in fit_verifier / space_allocator /
  pipeline left for a separate follow-up axis.
- Phase Z 22-step runtime, frame_selection, light_edit/restructure
  flows untouched.

Refs: IMP-27 (gitea #27), INSIGHT-MAP §5 K5, PHASE-Q-AUDIT §2.10
2026-05-20 19:31:26 +09:00
5d23b747ff fix(orchestrator): P5b first-line agent header strict + supplement throttle
Bug discovered during #24 IMP-24 K6 Stage 2 (2026-05-20):
- Codex r1, r2, r3 started with '=== IMPLEMENTATION_UNITS ===' on first line
  (not '[Codex #N] ...'), so detect_agent (P0-1 strict, first-line only)
  returned None.
- For non-audit issues, the P5 supplement guard was audit-only gated → silent
  loop until Codex r4 happened to use correct format. 4 rounds wasted.

Verified that #21 Stage 4 had the same latent silent loop pattern
('## [Codex #1]' first line) — orchestrator looped through ~10 Claude rounds
before random recovery. P5b fix addresses this long-standing bug.

Patch (defensive parser-contract hardening; does not assume single root cause):

1. RULES global gets explicit "FIRST non-empty line MUST be [Claude #N] /
   [Codex #N]" rule that OVERRIDES any stage-specific "body MUST contain"
   constraint.

2. COMPACT_PLAN_RULE wording clarified: "body" begins AFTER the first-line
   agent header. The 'body MUST contain ONLY' set no longer accidentally
   permits '=== IMPLEMENTATION_UNITS ===' on line 1.

3. is_codex None supplement guard:
   - audit-only gate REMOVED → fires for all issues (#24 latent loop fixed)
   - Throttle: max 2 supplements per stage; on 3rd violation, orchestrator
     hard-stops the issue with explicit "user action required" message
     and exits run_stage cleanly
   - Supplement message names both Claude AND Codex (Claude's first-line
     violation also breaks downstream via Codex mimicry)
   - Body-head 80 chars logged on detection failure (debugging aid)

4. Regression tests (+5 cases in test_orchestrator_core.py):
   - TestDetectAgent: '=== IMPLEMENTATION_UNITS ===' first line → None
   - TestDetectAgent: [Codex #N] first line + units after → 'codex' OK
   - TestDetectAgent: '## ', '📌 **', '**' prefix all → None
   - TestRulesAndCompactPlanFirstLineContract: RULES wording has FIRST/OVERRIDES
   - TestRulesAndCompactPlanFirstLineContract: COMPACT_PLAN_RULE has carve-out

Cosmetic side effect (accepted): Claude's '📌 **[Claude #N] ...**' or
'## [Codex #N] ...' decoration prefixes will fail detect_agent. Agents
will drop decorations from line 1; line 2+ can still use them.

Out of scope (NOT included to keep regression risk low):
- detect_agent function logic UNCHANGED (P0-1 strict preserved)
- consensus parser UNCHANGED
- stage loop structure UNCHANGED
- git/Gitea retrieval logic UNCHANGED
- audit-only mode P4/P4a guards UNCHANGED
- pre-post comment validation (future axis, larger refactor)

Total: 131/131 pytest pass (126 prior + 5 new).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-20 17:01:24 +09:00
134f52d3d3 feat(#58): L3 dormant trigger guard -- DORMANT-TRIGGERS.yaml + checker + orchestrator hook
P5-1 docs/architecture/DORMANT-TRIGGERS.yaml -- 5 entries (IMP-16/17/18/19 active + IMP-20 followup-linked #55).
P5-2 scripts/check_dormant_triggers.py -- standalone, reads registry, scans tree + diff, writes .orchestrator/dormant_alerts.json, exit 0 always.
P5-3 orchestrator.py -- _check_dormant_triggers() helper + Stage 4->5 informational alert branch (skips audit-only, never blocks).
P5-4 tests/orchestrator_unit/test_dormant_triggers.py -- 30 cases (yaml schema, registry contents, checker matching, false-positive guards, manual-evidence skip, orchestrator branch, audit bypass, governance ref).
P5-5 PROJECT-INTENT-AND-GOVERNANCE.md -- single anti-patterns row referencing the L3 registry as binding contract surface.

Tests: pytest -q tests = 337 passed (baseline 307 + 30 new).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-20 09:43:14 +09:00
9389b8425b fix(orchestrator): P5 audit-anchor-first-line regression guard
Bug discovered during #56 INTEGRATION-AUDIT-02 execution (2026-05-20):
- Both Claude and Codex put "Audit anchor: ..." as the FIRST line of every
  Gitea comment per the #56 issue body instruction "cite anchor at start
  of every stage".
- detect_agent (P0-1 strict, first-line only) then returns None for these
  comments because the first line is "Audit anchor:..." not "[Codex #N]"
  or "[Claude #N]".
- Result: orchestrator's "is_codex" check (line ~1288) flips false →
  "Codex 응답 미감지 — continuing" → infinite Stage 4 loop. #56 reached
  Round #14 (>300 comments, ~2 hours wasted token).

Fix path (NOT relaxing detect_agent — that would revive the original #45
pre-P0-1 bug where [Claude #N] citations inside Codex bodies caused
mis-detection):

1. AUDIT_ONLY_NOTE updated to enforce comment format:
   - FIRST non-empty line MUST be `[Claude #N] <stage>` or `[Codex #N] <stage>`
   - Audit anchor / banners / prefaces MUST appear line 2 or later
   - Concrete CORRECT example included
   - Explicit warning that violation breaks stage advance

2. is_codex None guard auto-supplements:
   - When _audit_mode(title) AND detect_agent returns None, orchestrator
     posts a Gitea supplement comment requesting the correct format
   - Next round's Claude/Codex see the supplement and correct
   - Breaks the infinite loop automatically (no manual ctrl-C needed)

3. Regression tests in TestDetectAgent (test_orchestrator_core.py):
   - test_audit_anchor_preface_breaks_detection: confirms P0-1 strict
     correctly returns None when anchor is first line
   - test_audit_anchor_after_header_works: correct format passes

Total: 96/96 pytest pass (94 prior + 2 P5 regression).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-20 07:03:12 +09:00
02e2ae0afb docs(#54): F-4 legacy annotation + F-5 fixture convention -- AUDIT-01 housekeeping
INTEGRATION-AUDIT-01 (#50) §10.4 / §10.5 housekeeping carry-over.

F-4: annotate 14 remaining legacy Phase R'/Q sample-text hits across 10
src/ files with inline marker `# [legacy Phase R'/Q example -- INTEGRATION-AUDIT-01 §10.4]`.
Comment-only. No string-literal / regex / sample dict value mutated.
fit_verifier.py L612 marker keeps Phase Z partial-live import graph
(FitAnalysis / RoleFit / redistribute / salvage) byte-precise.

F-5: docs-only addendum -- §10.5.1 in INTEGRATION-AUDIT-01-REPORT.md +
tests/CLAUDE.md fixture convention note. No root tests/fixtures/ dir
created; existing tests/phase_z2/fixtures/ convention preserved. Documents
test-only sample-reference allowance vs src/** runtime prohibition.

Out of scope: Phase Z source 11 hits (phase_z2_content_extractor /
failure_router / mapper / retry), production behavior change, #19 work.

Verified: pytest -q tests/phase_z2/ = 157 PASS. git diff +210/-0
(35 src/docs lines + 175 new tests/CLAUDE.md). No behavioral delta.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-19 20:23:36 +09:00
8f06a4c99f docs(IMP-52): reconcile Phase Z family count drift -- F-2 option (c)
Audit follow-up F-2 (INTEGRATION-AUDIT-01 §10.2). Phase Z families surface
showed 11 tracked / 11 contracted / 13 on disk. The 2 untracked WIP files
(app_sw_package_vs_solution.html, pre_construction_model_info_stacked.html)
are now declared in _WIP_FILES.md as uncontracted and out-of-scope for the
runtime matcher; promote/remove is gated on #42. The 11/11 tracked +
contracted baseline is unchanged. A new pytest enforces tracked families ↔
frame_contracts.yaml set-equality modulo the WIP allowlist parsed from
_WIP_FILES.md, so future drift fails fast in CI before #42 expands to 32
frames.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-19 19:15:04 +09:00
e32f632464 fix(orchestrator): P4a baseline-diff guard + Stage 5 commit scope
P4 had two production issues blocking #50 integration audit deployment:

1. Stage 3 guard had no baseline awareness — flagged ALL forbidden-path
   changes including pre-existing dirty WIP. Empirical: 328 such files
   already in current working tree (tests/matching/ artifacts etc).
   #50 would have hit reject loops immediately without Claude doing
   anything wrong.

2. Stage 5 had no commit-scope guard — if Claude ran `git add -A` and
   committed user's existing WIP, audit commit would be polluted with
   unrelated production changes.

P4a additions:
- _audit_baseline_path / _ensure_audit_baseline / _load_audit_baseline:
  snapshot working-tree dirty paths at run_issue entry for audit issues.
  Resumed runs preserve existing baseline (no overwrite).
- _check_audit_only_violations(baseline=None): accept baseline set,
  subtract from violations — only flags NEW forbidden changes introduced
  after audit start.
- _check_audit_commit_scope: verify HEAD commit's file list matches
  AUDIT_ALLOWED_COMMIT_GLOBS (INTEGRATION-AUDIT-*.md, BACKLOG.md).
- run_issue: save baseline on audit-mode entry only — no impact on
  normal issues.
- Stage 5 (commit-push) YES gate: new guard rejects on out-of-scope
  files with remediation prompt (git reset --soft + force-with-lease).

19 new tests:
- baseline subtraction (5): pre-existing removed, None=keep-all,
  empty-set=catch-all, full-coverage filter, Windows path normalize.
- baseline persist (5): roundtrip, no-overwrite on resume, missing
  fallback, corrupt JSON fallback, non-list fallback.
- commit scope detection (7): report-only allowed, backlog allowed,
  src/ rejected, unrelated docs rejected, git error fail-open,
  Windows backslash, empty commit pass.
- allowed globs sanity (2): every glob has audit marker, all under
  docs/architecture/.

Total: 94/94 pytest pass (75 prior + 19 new).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-19 10:29:15 +09:00
4289a500b6 feat(orchestrator): P3 wrapper input/encoding fix + P4 audit-only mode
P3 hotfix (2026-05-18 — verified during #46 retry attempt):
- _run_with_tree_kill: encode input only when Popen is in binary mode.
  Previously force-encoded str→bytes even with encoding= set, breaking
  text-mode stdin pipes with: write() argument must be str, not bytes.
- run_claude path was the only affected call site.
- 3 new C7 regression tests (input+encoding / bytes+binary / auto-encode).
- C3/C6 test fixtures hardened with DEVNULL stdio isolation.

P4 audit-only mode (2026-05-19, prep for #50 integration audit):
- _is_audit_issue: title-based detection for [INTEGRATION-AUDIT*],
  [AUDIT-ONLY], or "integration audit" phrase.
- _audit_mode + --audit-only CLI flag: manual override regardless of title.
- AUDIT_ONLY_NOTE injected into context pack across all stages/rounds.
- Stage 3 (code-edit) YES gate: deterministic git status check.
  Changes touching src/**, templates/**, tests/** auto-reject Stage 3 YES
  and post a supplement-request comment. LLM-independent enforcement.
- 26 new audit-mode tests (title detection, CLI override, forbidden
  prefix detection, allowed paths pass, Windows backslash normalization,
  quoted paths with spaces, git error fail-open, constants sanity).

Total: 75/75 pytest pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-19 10:18:28 +09:00
e10ec36617 feat(IMP-17): AI repair fallback infra carve-out — design-only boundary + 3-cond AND gate
u1 — src/phase_z2_pipeline.py:564 route hint comment corrected from
non-existent IMP-31 to IMP-17 (carve-out, AI fallback only, normal path 밖).
Line 565 IMP-29 frontend override reference untouched.

u2 — docs/architecture/IMP-17-CARVE-OUT.md (new) defines:
- allowed scope (Step 12 restructure proposal, Step 16/17 retry fallback)
- forbidden scope (normal-path AI calls, MDX compression, HTML structure)
- 3-condition AND activation gate (User GO ∧ B4 frame_selection evidence
  ∧ IMP-04 catalog + IMP-05 V4 fallback live)
- pattern shape reference (link-only): content_editor.py:21,318 +
  sse_utils.py:16-50 (Phase Q Archive Candidate, no port)
- AI 격리 contract + Kei persona 단절 (permanent)

u3 — PHASE-Z-IMPLEMENTATION-ISSUE-BACKLOG.md:68 IMP-17 row gains
carve-out doc link + 3-cond AND gate pointer.

u4 — PHASE-Q-INSIGHT-TO-22STEP-MAP.md AI repair fallback infra registry
row prefixed with IMP-17 + carve-out link; normal_path=no preserved.

Anchor test: tests/orchestrator_unit/test_imp17_comment_anchor.py asserts
line 564 IMP-17 wording AND line 565 IMP-29 preservation (2 tests pass).

Runtime behavior change: 0. Only delta in executable file is one comment
line. Normal-path AI invocation count remains 0.

Refs: gitea #17

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-19 08:12:43 +09:00
23ba8b68cd feat(IMP-16): U1 H3 verification utility port + U2 wiring design
U1 (runtime, u1-u10): new Phase Z-owned deterministic verification module
src/phase_z2_verification_utils.py (335 LOC, stdlib only) porting H3 utility
surface — VerificationResult, extract_text_from_html, normalize_for_comparison,
extract_keywords, strip_meta_lines, split_into_sentences, verify_text_preservation,
detect_invented_text. 10 unit tests under tests/phase_z2/test_pz2_vu_*.py (56 tests).

u11 (design-only): docs/architecture/IMP-16-U2-WIRING-DESIGN.md fixes the Step
1/2/14/21/22 reverse-path contract, redesigned frame-contract pattern
reservation (IMP-20), and IMP-07 hard-gate criteria. No runtime wiring lands
in this commit — U2 stays blocked until IMP-07 reverse path is implemented +
verified + runtime-hit.

Guardrails: no src.content_verifier import; no FORBIDDEN_KEI_MEMOS /
generate_with_retry / REQUIRED_PATTERNS / verify_structure / verify_area /
verify_all_areas usage; no AI / Kei / httpx / SSE path; AI-isolation contract
upheld (utility is deterministic).

Tests: 56 targeted PASS (0.19s), 15 regression baseline PASS (7.59s).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-19 04:42:35 +09:00
614c53358e feat(IMP-15): 실행-4 — debug.json event surfacing + spec taxonomy row
Issue: #48 (IMP-15 실행-4, axis 4: debug.json + spec doc trace).
Parent: #15. Depends on 실행-1/2/3 (events + classifier outputs).

Surfaces the image/table event streams that 실행-1/2/3 already produced
and consumed, mirroring the existing `zone_geometries_px` top-level
precedent (no new pattern introduced). Adds the matching taxonomy row
to the Phase Z fit-classifier/router spec.

src/phase_z2_pipeline.py (+3):
- write_debug_json now lifts `image_events` and `table_events` to
  top-level of `debug.json` via `(visual_runtime_check or {}).get(<k>, [])`,
  exactly mirroring the immediately preceding `zone_geometries_px`
  surfacing line. Defaults to `[]` when `visual_runtime_check` is None
  — additive, no consumer-visible breakage.

docs/architecture/PHASE-Z-FIT-CLASSIFIER-ROUTER-SPEC.md (+1):
- §3.1 taxonomy adds `image_aspect_mismatch` row. Row text explicitly
  marks the signal as post-render `fail_reasons` from Step 14
  visual_runtime_check (rendered vs declared aspect ratio mismatch),
  NOT a router-routed fit_classifier output, and notes the separate
  `image_events` stream surface. Prevents future readers from wiring
  this taxonomy into §3.2 priority list or §4 router action map.

tests/phase_z2/test_debug_json_event_surfacing.py (new, 2 tests):
- `test_write_debug_json_surfaces_image_and_table_events` invokes
  write_debug_json with synthetic visual_runtime_check containing
  both event lists; reads back the on-disk debug.json and asserts
  both keys are present at top level with the exact payloads.
- `test_write_debug_json_defaults_when_visual_runtime_check_none`
  asserts both new keys default to `[]` when visual_runtime_check
  is None — guards the defensive `(… or {})` pattern.

tests/phase_z2/test_spec_taxonomy_image_aspect_mismatch.py (new, 2 tests):
- `test_spec_has_image_aspect_mismatch_row` opens the spec file and
  asserts exactly one `^\| image_aspect_mismatch \|` row exists
  inside the §3.1 table block (no markdown-parser dependency).
- `test_spec_row_marks_post_render_fail_reasons_semantic` asserts the
  row text carries both "Post-render" and "fail_reasons" tokens —
  enforces the Stage 1 guardrail wording.

Verification (Stage 4 PASS, Claude + Codex independent):
- pytest -q tests/phase_z2/test_debug_json_event_surfacing.py \
              tests/phase_z2/test_spec_taxonomy_image_aspect_mismatch.py
  → 4 passed in 0.07s.
- git diff scope: 4 files, +148 insertions / 0 deletions.

Scope-locked: no edits to classifier (실행-3), event generation
(실행-1/2), Step 21 viewer, §3.2 priority list, §4 router action
mapping, or `table_self_overflow` taxonomy row. Pre-existing
dirty/untracked working-tree files left untouched.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 22:25:41 +09:00
535c4848fd feat(IMP-15): 실행-3 — classifier consumes image+table events
Issue #47 (IMP-15 실행-3 axis 3): extend `classify_visual_runtime_check`
to consume the `image_events[]` and `table_events[]` arrays produced by
`run_overflow_check` (실행-1/2) and widen `visual_check_passed`.

Changes (src/phase_z2_classifier.py):
- Remove `overflow.passed=True` early-return so image/table event scans
  always run, even when zone-level overflow was clean.
- Deferred import of `IMAGE_ASPECT_DELTA_TOL` and `TABLE_SCROLL_TOL_PX`
  from `phase_z2_pipeline` (circular-safe SSoT; no duplicate literals).
- New `image_events` scan emits `image_aspect_mismatch` when
  `delta is not None AND |delta| > IMAGE_ASPECT_DELTA_TOL`
  (delta=None ⇒ skip, image not loaded).
- New `table_events` scan emits `tabular_overflow` when
  `wrapper_clipped_index is None AND (excess_x or excess_y > TABLE_SCROLL_TOL_PX)`
  (wrapper-clipped tables deduped against the existing zone cascade).
- `visual_check_passed = overflow.passed AND not classifications` —
  any image/table classification now flips the gate.

Guardrails preserved:
- §3.2 8-rule zone cascade (clipped_inner / zone-self) untouched —
  the new emitters are ADDITIONAL.
- `placement_diagnostics`, `categories_seen`, `unclassified_signals`
  return-shape preserved.
- No `pipeline.py` production changes; no router action or
  `debug.json` passthrough changes.

Tests (tests/phase_z2/test_phase_z2_visual_classifier.py — new):
- `test_image_aspect_mismatch_emits_classification` (|delta|>TOL fires)
- `test_image_aspect_delta_below_tol_no_classification` (≤TOL skipped)
- `test_standalone_table_overflow_emits_classification`
  (wrapper_clipped_index=None, excess>TOL fires)
- `test_table_dedup_when_wrapper_clipped`
  (wrapper_clipped_index set ⇒ no `tabular_overflow` emit)

All 4 pure-dict (no Selenium / chromedriver / pipeline execution).
Tolerances imported from `phase_z2_pipeline` (SSoT enforced via test
import — no classifier-local literals).

Verification (Stage 4):
- New classifier tests: 4/4 PASS.
- Regression `tests/phase_z2/` excluding new file: 93/93 PASS.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 21:45:06 +09:00