Bug discovered during #56 INTEGRATION-AUDIT-02 execution (2026-05-20):
- Both Claude and Codex put "Audit anchor: ..." as the FIRST line of every
Gitea comment per the #56 issue body instruction "cite anchor at start
of every stage".
- detect_agent (P0-1 strict, first-line only) then returns None for these
comments because the first line is "Audit anchor:..." not "[Codex #N]"
or "[Claude #N]".
- Result: orchestrator's "is_codex" check (line ~1288) flips false →
"Codex 응답 미감지 — continuing" → infinite Stage 4 loop. #56 reached
Round #14 (>300 comments, ~2 hours wasted token).
Fix path (NOT relaxing detect_agent — that would revive the original #45
pre-P0-1 bug where [Claude #N] citations inside Codex bodies caused
mis-detection):
1. AUDIT_ONLY_NOTE updated to enforce comment format:
- FIRST non-empty line MUST be `[Claude #N] <stage>` or `[Codex #N] <stage>`
- Audit anchor / banners / prefaces MUST appear line 2 or later
- Concrete CORRECT example included
- Explicit warning that violation breaks stage advance
2. is_codex None guard auto-supplements:
- When _audit_mode(title) AND detect_agent returns None, orchestrator
posts a Gitea supplement comment requesting the correct format
- Next round's Claude/Codex see the supplement and correct
- Breaks the infinite loop automatically (no manual ctrl-C needed)
3. Regression tests in TestDetectAgent (test_orchestrator_core.py):
- test_audit_anchor_preface_breaks_detection: confirms P0-1 strict
correctly returns None when anchor is first line
- test_audit_anchor_after_header_works: correct format passes
Total: 96/96 pytest pass (94 prior + 2 P5 regression).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Audit follow-up F-2 (INTEGRATION-AUDIT-01 §10.2). Phase Z families surface
showed 11 tracked / 11 contracted / 13 on disk. The 2 untracked WIP files
(app_sw_package_vs_solution.html, pre_construction_model_info_stacked.html)
are now declared in _WIP_FILES.md as uncontracted and out-of-scope for the
runtime matcher; promote/remove is gated on #42. The 11/11 tracked +
contracted baseline is unchanged. A new pytest enforces tracked families ↔
frame_contracts.yaml set-equality modulo the WIP allowlist parsed from
_WIP_FILES.md, so future drift fails fast in CI before #42 expands to 32
frames.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
P4 had two production issues blocking #50 integration audit deployment:
1. Stage 3 guard had no baseline awareness — flagged ALL forbidden-path
changes including pre-existing dirty WIP. Empirical: 328 such files
already in current working tree (tests/matching/ artifacts etc).
#50 would have hit reject loops immediately without Claude doing
anything wrong.
2. Stage 5 had no commit-scope guard — if Claude ran `git add -A` and
committed user's existing WIP, audit commit would be polluted with
unrelated production changes.
P4a additions:
- _audit_baseline_path / _ensure_audit_baseline / _load_audit_baseline:
snapshot working-tree dirty paths at run_issue entry for audit issues.
Resumed runs preserve existing baseline (no overwrite).
- _check_audit_only_violations(baseline=None): accept baseline set,
subtract from violations — only flags NEW forbidden changes introduced
after audit start.
- _check_audit_commit_scope: verify HEAD commit's file list matches
AUDIT_ALLOWED_COMMIT_GLOBS (INTEGRATION-AUDIT-*.md, BACKLOG.md).
- run_issue: save baseline on audit-mode entry only — no impact on
normal issues.
- Stage 5 (commit-push) YES gate: new guard rejects on out-of-scope
files with remediation prompt (git reset --soft + force-with-lease).
19 new tests:
- baseline subtraction (5): pre-existing removed, None=keep-all,
empty-set=catch-all, full-coverage filter, Windows path normalize.
- baseline persist (5): roundtrip, no-overwrite on resume, missing
fallback, corrupt JSON fallback, non-list fallback.
- commit scope detection (7): report-only allowed, backlog allowed,
src/ rejected, unrelated docs rejected, git error fail-open,
Windows backslash, empty commit pass.
- allowed globs sanity (2): every glob has audit marker, all under
docs/architecture/.
Total: 94/94 pytest pass (75 prior + 19 new).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Issue: #48 (IMP-15 실행-4, axis 4: debug.json + spec doc trace).
Parent: #15. Depends on 실행-1/2/3 (events + classifier outputs).
Surfaces the image/table event streams that 실행-1/2/3 already produced
and consumed, mirroring the existing `zone_geometries_px` top-level
precedent (no new pattern introduced). Adds the matching taxonomy row
to the Phase Z fit-classifier/router spec.
src/phase_z2_pipeline.py (+3):
- write_debug_json now lifts `image_events` and `table_events` to
top-level of `debug.json` via `(visual_runtime_check or {}).get(<k>, [])`,
exactly mirroring the immediately preceding `zone_geometries_px`
surfacing line. Defaults to `[]` when `visual_runtime_check` is None
— additive, no consumer-visible breakage.
docs/architecture/PHASE-Z-FIT-CLASSIFIER-ROUTER-SPEC.md (+1):
- §3.1 taxonomy adds `image_aspect_mismatch` row. Row text explicitly
marks the signal as post-render `fail_reasons` from Step 14
visual_runtime_check (rendered vs declared aspect ratio mismatch),
NOT a router-routed fit_classifier output, and notes the separate
`image_events` stream surface. Prevents future readers from wiring
this taxonomy into §3.2 priority list or §4 router action map.
tests/phase_z2/test_debug_json_event_surfacing.py (new, 2 tests):
- `test_write_debug_json_surfaces_image_and_table_events` invokes
write_debug_json with synthetic visual_runtime_check containing
both event lists; reads back the on-disk debug.json and asserts
both keys are present at top level with the exact payloads.
- `test_write_debug_json_defaults_when_visual_runtime_check_none`
asserts both new keys default to `[]` when visual_runtime_check
is None — guards the defensive `(… or {})` pattern.
tests/phase_z2/test_spec_taxonomy_image_aspect_mismatch.py (new, 2 tests):
- `test_spec_has_image_aspect_mismatch_row` opens the spec file and
asserts exactly one `^\| image_aspect_mismatch \|` row exists
inside the §3.1 table block (no markdown-parser dependency).
- `test_spec_row_marks_post_render_fail_reasons_semantic` asserts the
row text carries both "Post-render" and "fail_reasons" tokens —
enforces the Stage 1 guardrail wording.
Verification (Stage 4 PASS, Claude + Codex independent):
- pytest -q tests/phase_z2/test_debug_json_event_surfacing.py \
tests/phase_z2/test_spec_taxonomy_image_aspect_mismatch.py
→ 4 passed in 0.07s.
- git diff scope: 4 files, +148 insertions / 0 deletions.
Scope-locked: no edits to classifier (실행-3), event generation
(실행-1/2), Step 21 viewer, §3.2 priority list, §4 router action
mapping, or `table_self_overflow` taxonomy row. Pre-existing
dirty/untracked working-tree files left untouched.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Issue #47 (IMP-15 실행-3 axis 3): extend `classify_visual_runtime_check`
to consume the `image_events[]` and `table_events[]` arrays produced by
`run_overflow_check` (실행-1/2) and widen `visual_check_passed`.
Changes (src/phase_z2_classifier.py):
- Remove `overflow.passed=True` early-return so image/table event scans
always run, even when zone-level overflow was clean.
- Deferred import of `IMAGE_ASPECT_DELTA_TOL` and `TABLE_SCROLL_TOL_PX`
from `phase_z2_pipeline` (circular-safe SSoT; no duplicate literals).
- New `image_events` scan emits `image_aspect_mismatch` when
`delta is not None AND |delta| > IMAGE_ASPECT_DELTA_TOL`
(delta=None ⇒ skip, image not loaded).
- New `table_events` scan emits `tabular_overflow` when
`wrapper_clipped_index is None AND (excess_x or excess_y > TABLE_SCROLL_TOL_PX)`
(wrapper-clipped tables deduped against the existing zone cascade).
- `visual_check_passed = overflow.passed AND not classifications` —
any image/table classification now flips the gate.
Guardrails preserved:
- §3.2 8-rule zone cascade (clipped_inner / zone-self) untouched —
the new emitters are ADDITIONAL.
- `placement_diagnostics`, `categories_seen`, `unclassified_signals`
return-shape preserved.
- No `pipeline.py` production changes; no router action or
`debug.json` passthrough changes.
Tests (tests/phase_z2/test_phase_z2_visual_classifier.py — new):
- `test_image_aspect_mismatch_emits_classification` (|delta|>TOL fires)
- `test_image_aspect_delta_below_tol_no_classification` (≤TOL skipped)
- `test_standalone_table_overflow_emits_classification`
(wrapper_clipped_index=None, excess>TOL fires)
- `test_table_dedup_when_wrapper_clipped`
(wrapper_clipped_index set ⇒ no `tabular_overflow` emit)
All 4 pure-dict (no Selenium / chromedriver / pipeline execution).
Tolerances imported from `phase_z2_pipeline` (SSoT enforced via test
import — no classifier-local literals).
Verification (Stage 4):
- New classifier tests: 4/4 PASS.
- Regression `tests/phase_z2/` excluding new file: 93/93 PASS.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Add table self-overflow detection with element-identity wrapper dedup,
mirroring the image_aspect_mismatch axis pattern (#45).
JS layer: TABLE_SCROLL_TOL_PX=5 module constant; clippedWrapperMap
built as Map<Element,int> keyed by DOM node reference (NOT className)
so two wrappers with identical class strings remain distinguishable;
table_events collected via querySelectorAll('table').forEach with
closest()-ancestor walk resolving wrapper_clipped_index = int|null.
Py layer: aggregate result['table_events'] and append fail_reason
'table_self_overflow' only when (excess_x>TOL OR excess_y>TOL)
AND wrapper_clipped_index is None; wrapper-clipped path continues
to fail via existing clipped_inner reporting.
Tests (Selenium, chromedriver guard mirrored from image_check):
- Fixture D: standalone <table> overflow → table_self_overflow fail
- Fixture E: <table> in clipped wrapper → dedup suppresses table fail
- Fixture F (F1 acceptance): two wrappers with identical className
f13b-cell, W1 clipped by non-table child, W2 hosts self-overflow
<table> with W2 itself NOT clipped → element-identity ensures W2's
table is not suppressed by W1's class; both fails emitted.
Out of scope: image_events behavior (intact from #45), classifier
pass/fail consumer (→실행-3), debug.json surfacing (→실행-4).
Refs: #46
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
scripts/generate_frame_previews.py iterates figma_to_html_agent/blocks/{frame_id}/index.html,
renders preview.png via Selenium headless (capture_slide_screenshot pattern reuse), and writes
_preview_manifest.json (schema v1) with idempotent stale-detect (mtime+sha256). Build-time only
— no runtime pipeline integration, no AI calls, no MDX/Jinja regen. Stage 2 baseline (commit
56619a0): total=33, renderable=20, missing_index_html=13, orphan=1 (1171281192).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Stage 3 lock implementation: extend build_layout_css dispatch beyond
the horizontal-2 / vertical-2 1-D dynamic paths. T / inverted-T /
side-T-left / side-T-right / 2x2 now flow through a 2-D track solver
instead of the fr_default sink, with length-locked heights_px (R) +
widths_px (C) on every return path (default and override).
PR 2 scope (u1~u5):
- u1: _aggregate_zone_signals_per_track — per-row + per-col virtual
zones via max(weight) + max(min_height_px) of single-span zones,
falling back to all-span when a track has none.
- u2: _build_grid_dynamic_2d default builder — feeds virtual zones
into compute_zone_layout + compute_zone_layout_cols; emits
computation="2d_dynamic_aggregated", dynamic_rows=True,
dynamic_cols=True.
- u3: _override_to_grid_tracks override builder — single-span
aggregation (max h per row, max w per col), normalize, multiply
by avail_h/avail_w, last-element diff absorb; emits
computation="user_override_geometry"; falls back to u2 when
total_h or total_w == 0.
- u4: build_layout_css dispatcher wiring — topology in
{T, inverted-T, side-T-left, side-T-right, 2x2} routes to
_build_grid_dynamic_2d (default) or _override_to_grid_tracks
(override); legacy [override-warning] stderr removed for the
5 presets; step08 trace gains a 2-D-aware print line that fires
before the dynamic_rows / dynamic_cols branches.
- u5: PR 1 lock test test_top_1_bottom_2_fr_default_populates_geometry
renamed to test_top_1_bottom_2_dynamic_2d_populates_geometry and
flipped to PR 2 reality (computation="2d_dynamic_aggregated",
dynamic_rows=True, dynamic_cols=True).
Fixtures: 10 build_layout_css (5 presets × {default, override}) +
5 retry_gate *_dynamic_2d.yaml locking the retry gate skip reason
"dynamic_cols (2-D topology) ... IMP-09 lock" for the 5 presets.
Tests: python -m pytest -q tests = 104 passed (Stage 2 baseline
10 RED → GREEN, 0 regressions). Kei archive
(build_containers_type_b / page_structure) untouched —
rg "build_containers_type_b|page_structure" src/phase_z2_pipeline.py
returns 0 hits.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Codex #1 (Stage 5) reproduced a smoke regression on the actual checkout :
when V4 carries the parent exact key (e.g., `04-2`) AND the drag/drop
override targets a sub-id (`primary=04-2-sub-1`), the aligner kept the
parent at parent granularity and emit `['04-1', '04-2']`, so the override
flag failed with `unknown section_id(s) ['04-2-sub-1']`.
Fix : `align_sections_to_v4_granularity` gains an optional
`override_target_section_ids` keyword. From each canonical
`${parent}-sub-N` target it derives the parent id and adds it to a
`force_drill_parents` set. Sections in that set are drilled into
sub-sections regardless of whether V4 carries the parent exact key.
Top-level override targets (no derived parent) do not trigger
force-drill, so backward-compat is preserved for parent-granularity
overrides.
The call site in `run_phase_z2_mvp1` collects sub-ids from
`override_section_assignments` and forwards them to the aligner.
Generalization (RULE 0) :
- Trigger is the override schema (`X-sub-N`), not a specific MDX / section /
frame id. Applies to all 32-frame MDX uniformly.
- Decision is deterministic on the override target shape, independent of
V4 yaml content.
- Default (no override) path is unchanged byte-for-byte.
Side fixes (forward-only RULE 1 cleanup, no history rewrite) :
- `align_sections_to_v4_granularity` docstring rewritten in English
(overwrites the Korean docstring committed in 5191aca).
- Step 9 diagnostic comment quoted-string rewritten in English
(overwrites `"V4 entry 없음"` committed in a422d72).
Tests : 3 new cases in `test_phase_z2_subsection_schema.py` —
`test_align_parent_v4_exact_keeps_section_when_no_override_targets_sub`
(backward-compat axis), `test_align_force_drills_when_override_targets_sub_id_with_parent_in_v4`
(blocker regression), `test_align_top_level_override_target_does_not_force_drill_other_sections`
(force-drill scope guard). Pytest scope-qualified result :
`test_phase_z2_subsection_schema.py` + `_section_assignment_override.py` +
`_v4_fallback.py` = 40 / 40 PASS.
Smoke (axis = sub-id override -> aligner -> assignment plan, both V4 yaml
shapes) :
- HEAD V4 yaml (`04-1`, `04-2.1`, `04-2.2` only) :
`--override-section-assignment primary=04-2-sub-1` ->
`aligned_section_ids=['04-1', '04-2-sub-1', '04-2-sub-2']`,
`plan[0].assignment_source='cli_override'`,
`plan[0].source_section_ids=['04-2-sub-1']`.
- V4 yaml with `04-2` exact key (Codex's stress case) : identical
aligned output and identical assignment plan.
Downstream `composition_planner` abort
(`phase_z_status_not_allowed:extract_matched_zone`) is IMP-05 territory,
unchanged in both shapes.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
align_sections_to_v4_granularity now emits canonical sub-section ids
of the form ${section_id}-sub-${ordinal} (e.g., "04-2-sub-1"), matching
the frontend drag/drop schema. Each drilled sub-section populates
heading_number (decimal "2.1" / integer "1" / None for undecorated)
and v4_alias_keys for legacy V4 keys.
N-R5 decimal-only alias guard : v4_alias_keys is populated only when
heading_number matches re.fullmatch(r"\d+\.\d+", ...). Integer-only
H3 headings (e.g., MDX 05's "### 1", "### 2") and bare H3 headings
produce no alias to avoid sibling-parent V4 collisions (RULE 0
generalization — applies to all 32-frame MDX, not MDX 05-specific).
The drill regex is broadened from r"^###\s+(\d+\.\d+)\s+..." to
r"^###\s+(?:(\d+(?:\.\d+)?)\s+)?(.+?)$" so integer-only and bare H3
headings are now recognised as sub-sections; they previously failed
the regex and were silently kept under the parent section.
Tests : 7 new cases (MdxSection default 4-positional callers, V4 exact
passthrough, decimal drill with alias, integer-only no-alias guard,
bare H3 no-alias, no-H3 passthrough, end-to-end aligner -> resolver
round-trip with legacy V4 alias). 15/15 in test_phase_z2_subsection_schema
+ 14 override + 8 fallback baseline = 37/37 PASS.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds sub-section schema fields (heading_number / v4_alias_keys /
sub_sections) to MdxSection with defaults so existing 4-positional
constructions remain valid. Introduces _resolve_v4_section_key helper
that resolves a V4 mdx_sections key in exact > alias > None order with
no parent/sibling promotion (axis 7 hybrid lock).
Rewires four runtime V4 lookup sites (lookup_v4_match,
lookup_v4_match_with_fallback, lookup_v4_all_judgments,
lookup_v4_candidates) to accept an optional alias_keys kwarg and go
through the resolver. U1 callers pass empty alias lists so behaviour
is byte-identical to the previous exact-match path; U2 will populate
aliases from MDX heading_number metadata.
Closure callers in run_phase_z2 build section_alias_by_id from
MdxSection.v4_alias_keys and forward into lookup_fn /
candidates_lookup_fn / lookup_v4_all_judgments (Step 7-A trace) and
into _select_template_for_overrides single-section selector.
Step 9 candidate report (post-decision diagnostic) is marked with an
inline English exemption comment per N-R6 — runtime selection goes
through _resolve_v4_section_key, the report path stays a direct
dict-shape lookup to avoid debug_zones schema plumbing.
derive_parent_id now recognises canonical ordinal ids
("03-1-sub-2" -> "03-1") first and keeps the legacy decimal fallback
("04-2.1" -> "04-2") for V4 alias compatibility.
Tests : 8 synthetic cases in tests/test_phase_z2_subsection_schema.py
covering derive_parent_id ordinal/decimal/none and the resolver
exact/alias/no-promote/miss cases. 30/30 PASS combined with the 14
override + 8 fallback baseline.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Refs #6
Stage 4 split per Codex #10 acceptance: this commit lands the schema +
trace refinements required before the render-path rewiring. The actual
units/zones_data/Step 9/Step 20 plan-driven materialization remains in
Stage 4 part 2 (follow-up commit) so each commit is reviewable on its
own and regression-safe.
- _build_position_assignment_plan: add replaced_auto_unit field. Populated
only when the explicitly overridden position already held an auto unit
AND that auto unit had different source_section_ids than the override.
Documents a same-position override replacement as a distinct audit fact,
separate from skipped_collided_auto_units which captures cross-position
whole-skips per the locked collision policy.
- Backfill replaced_auto_unit = None on the empty/collision/auto branches
for schema-stable consumers.
- Update the override-application comment near the helper invocation so it
no longer claims the helper "reorders units"; Stage 4 part 2 will be the
commit that wires the plan into the actual render path.
- Helper unit tests: assert replaced_auto_unit shape in the collision
scenario and add a dedicated case that distinguishes same-sections
(template swap via --override-frame -> None) from different-sections
same-position replacement (populated, reason="same_position_override_replacement").
No AI, no calculate_fit, no full planner rerun, no frontend, no sample
hardcoding. plan_composition() signature preserved. helper remains pure.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Refs #6
Backend / CLI / composition path only — frontend bridge remains #38.
- Add `--override-section-assignment ZONE_ID=section_id[,section_id]` to the
Phase Z entry parser. Parse-time hard errors for malformed payloads, empty
zone id, empty section list, duplicate zone id, and duplicate section across
zones (a section may belong to at most one zone).
- Add `_build_position_assignment_plan` helper (pure function, resolved
`positions` injected). Builds a per-position assignment plan with the
Codex-locked template_id ladder: (1) `--override-frame` exact unit_id wins,
(2) exact existing auto unit reuse, (3) single-section direct-executable V4
selector via `lookup_v4_match_with_fallback(..., raw_content=section.raw_content)`,
(4) ad-hoc multi-section override without exact auto + without explicit
override-frame yields `skipped_reason='ad_hoc_merged_no_template'`.
- Lock the collision policy: explicit override wins per position, sections
appear in at most one position, overlapping auto units are skipped whole
(no split, no cascade, no replan), uncovered sections from the previous
same-position auto unit are recorded in `uncovered_section_ids`.
- Additive trace fields on each plan entry: `previous_source_section_ids`,
`skipped_collided_auto_units`, `uncovered_section_ids`, `v4_selector_trace`,
`section_assignment_override`. Top-level `comp_debug["section_assignment_plan"]`
+ `comp_debug["section_assignment_summary"]` so Step 9 / debug artifacts can
derive from a single source of truth.
- Wire `run_phase_z2_mvp1(override_section_assignments=...)` after final layout
preset resolution: validate ZONE_IDs against active layout positions and
validate section_ids against aligned sections (fail-fast). The plan is
attached to `comp_debug` for downstream artifacts. Actual `zones_data` /
unit-list rewiring is deferred to a follow-up commit so this change stays
regression-safe; trace artifacts already surface override intent and
collision impact.
- Add 9 helper unit tests with fully synthetic MOCK_ ids (no real catalog
/ no v4_full32_result.yaml): non-conflicting auto retention, collision
whole-skip + uncovered tracing, template ladder steps 1/2/4, unit_id
naming convention, previous_source_section_ids position history,
empty-position case, summary aggregation invariants.
No AI, no `calculate_fit`, no full planner rerun, no frontend, no sample
hardcoding, no `restructure`/`reject` silent promotion. `plan_composition()`
signature is preserved.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Refs #5
Replace the hand-built Case 7 payload assertion with a temporary
production-source guard. The test now fails if Step 9 stops emitting
candidate_evidence, breaks the fallback_chain compat alias, or removes
the alias intent comment.
This is intentionally temporary because Step 9 application-plan unit
assembly is inline. Follow-up IMP-32 should extract a helper and replace
this source-string guard with a direct helper test.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Refs #5
- Add runtime template_id dedup in lookup_v4_match_with_fallback with
first-occurrence reservation; duplicate ranks become audit evidence,
not new fallback candidates.
- Add Step 9 candidate_evidence as the primary per-unit evidence field
while keeping fallback_chain as a compat alias for legacy readers.
- Add Step 20 fallback_selection_count and selection_paths derived from
comp_debug.v4_fallback_summary with defensive defaults; top-level
overall enum unchanged.
- Tighten synthetic fallback tests for duplicate handling (rank-1 reject A
+ rank-2 use_as_is A + rank-3 distinct B → rank-3 wins) and add tests
for candidate_evidence + alias equality and Step 20 qualifier presence
with defensive defaults.
- Verify with pytest (10 passed) and smoke_frame_render --self-check
(11/11 partials, IMP-04 F17 calibration intact).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>