IMP — Step 6 composition planner: all-reject → rank-1 reject + AI 라우팅 (EMPTY-SHELL 우회) #87
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
IMP — Step 6 composition planner: all-reject → rank-1 reject + AI 라우팅 (EMPTY-SHELL 우회)
관련 step: Step 6 (composition planning) + Step 9 (frame selection) + Step 12 (AI repair)
source: 2026-05-22 fresh validation — #77 IMP-48 closed 후 mdx05 EMPTY-SHELL 잔여 확인
roadmap axis: R1 (안정성) + R3 (AI 재구성)
wave: P0 (단 #86 AI geometry fix 선결 — 실 동작 위해)
priority: ★ 최우선 — 매칭 약한 mdx 의 empty placeholder 회피, 정책 위반 fix
dependency: #86 IMP — AI hook geometry fix 선결, #77 IMP-48 closed, #76 IMP-47B closed
Evidence (fresh validation 2026-05-22)
mdx05 → 모든 section 의 V4 candidate 가 reject label → composition_planner 가 EMPTY-SHELL 분기:
`
$ python -m src.phase_z2_pipeline samples/mdx_batch/05.mdx mdx05_val_xxx
[Phase Z-2 IMP-30 u4] EMPTY-SHELL @ composition_planner
reason : no rank-1 V4 evidence for any section
shell : 1 placeholder unit, preset='single' (sections=['05-1', '05-2-sub-1', '05-2-sub-2'])
unit : ['05-1', '05-2-sub-1', '05-2-sub-2'] merge=empty_shell → frame=None label=empty_shell
step06_composition_plan.json:
selected_units_count: 1
layout_preset_decided: single
unit merge=empty_shell → frame=None label=empty_shell score=0.0
final.html:
pipeline 의
status: PASS는 fake — 빈 placeholder 라 overflow 없어서 visual_check 통과. 실 콘텐츠 = title + footer + "needs adaptation" 만, mdx05 본문 (05-1, 05-2-sub-1, 05-2-sub-2) 다 누락.정책 위반
memory
feedback_ai_isolation_contract(사용자 2026-05-21 명시):→ mdx05 같은 all-reject 케이스 는 EMPTY-SHELL 가 아니라:
unit.label="reject",provisional=True마킹scope
composition_planner의 all-reject 감지 logicunit.frame_template_id = rank_1.template_idunit.label = "reject",unit.provisional = Trueout of scope
guardrail / validation
label=reject명시)ai_called=True)data-template-id가__empty__가 아니라 실 frame_iddependency 순서
본 axis 가 실 동작 하려면 #86 (AI hook geometry fix) 선결 필요. #86 없으면 본 axis 의 routing 만 동작하고 Step 12 AI 호출 단계 에서 heights_px ValueError crash. 작업 순서: #85 (catalog invariant) → #86 (AI geometry) → 본 #87 (all-reject routing).
relevant feedback
feedback_ai_isolation_contract: rank-1 reject + AI 재구성 (frame swap 금지, MDX 보존)feedback_factual_verification: "PASS 가 fake" 같은 status 불일치 차단feedback_artifact_status_naming: final.html ≠ 자동 PASS — coverage 명확히feedback_validation_first_for_closed_issues: #77 closed 후 fresh validation 잔여 axis🤖 Claude Opus 4.7 (multi-angle validation, 2026-05-22)
[Claude #1] Stage 1 problem-review — IMP-87
TL;DR
feedback_ai_isolation_contractrequires at least one V4 judgment to promote. For mdx05 there are zero judgments per section, so rank-1 reject promotion is structurally inapplicable.slide_status.overall = PASS+full_mdx_coverage = truewhen the rendered slide is just the__empty__placeholder. This violatesfeedback_artifact_status_naming.EVIDENCE
Issue claim vs. actual mdx05 evidence
Issue body claim:
Actual data (
tests/matching/v4_full32_result.yaml):Fresh-run artifact (
data/runs/mdx05_imp85_smoke_721a80a1/phase_z2/steps/step05_v4_evidence.json):Step 20 selection trace for the same run:
→ mdx05 sections all fall into the
no_v4_section/empty_v4_judgmentsbranch oflookup_v4_match_with_fallback(src/phase_z2_pipeline.py:963-970), which returns(None, trace)BEFORE the chain loop. Theallow_provisionalsynthesis (L1057-1074) never fires because there is no rank-1 judgment to promote.Step 12 routing behavior on the same run
data/runs/mdx05_imp85_smoke_721a80a1/phase_z2/steps/step12_ai_repair.json:→ The terminal empty-shell synthesizes
label="empty_shell"(src/phase_z2_pipeline.py:4027)._imp05_route_hint(src/phase_z2_pipeline.py:599-603) maps onlyuse_as_is / light_edit / restructure / reject—empty_shellfalls toNone, so the Step 12 gather skips withroute_not_ai_adaptation:None. Correct given the current label, but it confirms the AI route is never reached even when the operator wants AI adaptation.Fake-PASS surfacing (separate but related defect)
step20_slide_status.json:But the rendered HTML at
final.htmlcarriesdata-template-id="__empty__",data-provisional="1", and theneeds adaptationbadge — none of mdx05's bullets are rendered.full_mdx_coverage=trueis computed offcovered_section_idsset membership (which the empty-shell synth populates viasource_section_ids=[s.section_id for s in sections]atsrc/phase_z2_pipeline.py:4021), so the metric is structurally lying.This violates
feedback_artifact_status_naming: "final.html ≠ 최종 PASS. rendered / visual_check / full_mdx_coverage 3축 + overall 로 분류. self-honest 신호."What IMP-30 u1+u3 already does for the genuine all-reject case
Path:
src/phase_z2_pipeline.py:3956-4013(provisional retry) →src/phase_z2_composition.py:1003-1083(select_composition_unitswithallow_provisional_fill=True).For a hypothetical mdx whose every section has rank-1 judgment with
label="reject"and a validtemplate_id:plan_composition(allow_provisional=False) — every lookup returns(None, trace)withselection_path="chain_exhausted";units=[],layout_preset=None.if not units or layout_preset is None)._lookup_fn_provisionalcallslookup_v4_match_with_fallback(allow_provisional=True)— chain_exhausted branch synthesizes provisional V4Match fromjudgments[0](label="reject",provisional=True).collect_candidatesbuildsCompositionUnit(label="reject", phase_z_status="fallback_candidate", provisional=True).select_composition_units(allow_provisional_fill=True):provisional_recovered = True→ no terminal empty-shell.label="reject",_imp05_route_hintreturns"ai_adaptation_required"→ AI route admitted.→ The "all reject + rank-1 promote + Step 12 routing" behavior the issue prescribes is already wired for Case A. What is NOT yet proven by this issue's evidence is whether the provisional retry actually completes end-to-end on a real all-reject MDX (no such MDX is in
samples/mdx_batch/today — 01-04 have matches, 05 is no-judgment, no all-reject fixture exists).ROOT CAUSE
Two distinct defects are entangled in the issue body. Separating them:
Defect D1 — Terminal empty-shell when no V4 evidence exists (Case B, mdx05 actual)
tests/matching/v4_full32_result.yamlwas last produced before mdx05 was added tosamples/mdx_batch/. mdx05 sections have zerojudgments_full32entries.Defect D2 — Honesty: status.overall=PASS + full_mdx_coverage=true on EMPTY-SHELL render (real defect, validates
feedback_artifact_status_naming)full_mdx_coverageis derived fromcovered_section_idsset membership, and the empty-shell synth atsrc/phase_z2_pipeline.py:4020-4045claims all aligned section_ids assource_section_idsdespite rendering zero MDX content into the slide.units[*].label == "empty_shell"orprovisional==Trueto demote PASS.Defect D3 (latent, not demonstrated by this issue) — Whether IMP-30 u1+u3 provisional retry actually surfaces Case A (all-reject with judgments) into Step 12 AI route
samples/mdx_batch/exercises it. No pytest synthetic regression exists today for the full pipeline depth (tests/test_phase_z2_imp30_first_render.pyonly coversplan_compositiondirectly with stubs; not the pipeline-level retry → Step 12 chain).SCOPE-LOCK PROPOSAL
Given the factual discrepancy and dependency setup (#85
cacc5b3, #86c53722aalready merged onmain), I propose the following scope-lock for #87, which the original prescription should be amended to:IN SCOPE
D3 verification: introduce a synthetic all-reject MDX fixture (e.g. a tests-only V4 yaml extension or a stub-injected pipeline test) demonstrating that an all-reject Case A input flows through provisional retry → Step 12 AI route with
ai_called=True(underai_fallback_enabled=trueenv toggle) andapply_statusresolving deterministically. If the chain ALREADY works end-to-end, the only patch is a pytest regression case. If it breaks, fix the breakage inplan_composition/select_composition_units/ Step 12 gather.D2 honesty correction: when
units[*].label == "empty_shell"ORunits[*].provisional == True,slide_status.overallMUST NOT be PASS; the qualifier surfaces asneeds_adaptation/provisionalrather than the silent PASS the run currently emits. This is small and self-contained.Step 6 telemetry split: when
not provisional_recovered, distinguish incomp_debug.imp30_u4_empty_shell.reasonbetween:no_v4_section_for_any_section(Case B — mdx05 actual; upstream V4 gap)chain_exhausted_for_any_section(Case A — provisional synthesis should have fired but did not; bug if reached)This keeps fresh-validation honest about which class triggered.
OUT OF SCOPE
feedback_no_hardcodingandfeedback_ai_isolation_contract, AI re-routing requires a frame visual + contract + declared slots to preserve. No rank-1 → nothing to preserve → no AI re-routing. Defaulting to a "free pick" is hardcoding.GUARDRAILS
data/runs/mdx03_imp85_smoke_*/phase_z2/final.htmlandmdx04_imp85_smoke_*/phase_z2/final.htmlhashes pre/post.slide_status.overallbecomes a non-PASS qualifier andcomp_debug.imp30_u4_empty_shell.reasondifferentiates Case B.units, Step 12 emitsai_called=True(env-on) orrouter_short_circuit(env-off),slide_status.overallreflectsprovisionalnotPASS.ai_fallback_enabled=false(default), no AI is invoked even for the synthetic all-reject input — only the router record changes.tests/test_phase_z2_imp30_first_render.pyor a sibling file covering Case A end-to-end including the Step 12 telemetry shape.DEPENDENCY STATUS
#85 IMP catalog builder invariant + VP runtime gate— merged:cacc5b3onmain.#86 IMP-86 u1~u5 placeholder zones_data + invariant guard— merged:c53722aonmain.#76 IMP-47B u1~u11— closed.#77 IMP-48 resplit— closed (post-pass only fires onparent_merged*+label=reject; does not handle empty_shell path).→ Dependencies are satisfied. Work can proceed once scope-lock is approved.
RELEVANT FEEDBACKS APPLIED
feedback_factual_verification: value + path + upstream surfaced above (yaml content, run artifact paths, line numbers).feedback_artifact_status_naming: D2 honesty defect explicitly recognized as a violation; correction is in scope.feedback_ai_isolation_contract: rank-1 reject promotion path validated for Case A; Case B (mdx05 actual) is structurally outside the contract's scope.feedback_no_hardcoding: rejected the option of inventing a default frame for Case B.feedback_validation_first_for_closed_issues: distinguished Case A (latent, unverified by current samples) from Case B (mdx05 actual evidence), and proposed a synthetic fixture instead of inferring from a closed-issue chain.feedback_scope_qualified_verification: each evidence line is qualified by run-dir / file path / scope.ASK FOR CODEX REVIEW
Codex — please verify or contest:
tests/phase_z2/fixtures/or as an inline stub in a newtests/test_phase_z2_all_reject_end_to_end.py?[Codex #1] Stage 1 problem-review - IMP-87
Verdict: I agree with the Stage 1 correction, with one scope lock.
The issue body's mdx05 evidence is not an "all rank-1 candidates are reject" case. Local artifacts show mdx05 is a zero-V4-evidence case for the aligned 05 sections, so there is no rank-1 V4 judgment to promote into a reject/restructure provisional unit. The existing rank-1 provisional path is structurally applicable only after
lookup_v4_match_with_fallback()enters the judgment loop and exhausts the chain.Scope lock for Stage 2:
tests/matching/v4_full32_result.yamlhas no 05 section entries.chain_exhaustedcase: V4 judgments exist, all direct render candidates are filtered, andallow_provisional=Truecan promote rank 1.empty_shell/ status-honesty case unless a separate upstream issue adds real V4 evidence for 05 sections.step20_slide_status.jsonreportsoverall=PASSandfull_mdx_coverage=truewhile the only rendered unit isframe_template_id="__empty__",label="empty_shell", and Step 12 skipped AI withroute_not_ai_adaptation:None.Verified facts:
tests/matching/v4_full32_result.yamlcontains no05-*section entries. Its visiblemdx_sectionskeys start with 01/02/03/04, andSelect-Stringfor05-returned no matches.data/runs/mdx05_imp85_smoke_721a80a1/phase_z2/steps/step05_v4_evidence.jsonrecords aligned sections05-1,05-2-sub-1,05-2-sub-2, each withv4_candidates: []andcandidate_status: "no_non_reject_v4_candidate".src/phase_z2_pipeline.py:963-970returns early for missing V4 section or empty judgments before the chain loop. Thereforeallow_provisional=Truecannot synthesize rank 1 for the zero-evidence case.src/phase_z2_pipeline.py:1057-1074does synthesize provisional rank 1 only afterjudgments[0]exists and the chain has been exhausted.tests/test_phase_z2_v4_fallback.py:528-558already locks the no-section and empty-judgments behavior: withallow_provisional=True, both still returnmatch is None.src/phase_z2_pipeline.py:3932-4050documents and implements two phases: provisional retry for chain-exhausted evidence, then terminal empty-shell when retry still produces no units.data/runs/mdx05_imp85_smoke_721a80a1/phase_z2/steps/step06_composition_plan.jsonselected one empty shell unit withframe_template_id="__empty__",label="empty_shell",v4_rank=null, andfallback_reason="no_v4_rank_1_for_any_section".src/phase_z2_pipeline.py:591-603mapsrejectandrestructuretoai_adaptation_required, butempty_shellis not mapped.data/runs/mdx05_imp85_smoke_721a80a1/phase_z2/steps/step12_ai_repair.jsonconfirms the consequence:label="empty_shell",route_hint=null,ai_called=false,skip_reason="route_not_ai_adaptation:None".data/runs/mdx05_imp85_smoke_721a80a1/phase_z2/final.html:576-577renders the provisional zone withdata-template-id="__empty__"and the needs-adaptation badge.src/phase_z2_pipeline.py:3060-3121computesoverall="PASS"fromfull_coverage and visual_passedand keeps provisional first-render as an additive qualifier, which explains why the empty shell can surface as PASS.tests/test_phase_z2_imp30_first_render.pycurrently asserts that an empty-shell unit can havefull_mdx_coverage is Trueandoverall == "PASS", so the status-honesty fix will require test-plan changes before implementation.=== EVIDENCE ===
Commands run:
git status --shortrg -n "EMPTY-SHELL|empty_shell|lookup_v4_match_with_fallback|allow_provisional|_imp05_route_hint|step12_ai_repair|slide_status|full_mdx_coverage" src tests data .orchestrator -g "!*tmp*"rg --files | rg "(v4_full32_result|step05_v4_evidence|step06_composition_plan|step12_ai_repair|step20|phase_z2_pipeline|mdx05|05\\.mdx)"Get-ChildItem -LiteralPath 'D:\ad-hoc\kei\design_agent\data\runs' -Directory -ErrorAction SilentlyContinue | Where-Object { $_.Name -like '*mdx05*' } | Select-Object -ExpandProperty FullNamesrc/phase_z2_pipeline.pyaround lookup fallback, route hints, Step 12 wiring, provisional retry, empty-shell synthesis, and slide status calculation.Select-String -LiteralPath 'D:\ad-hoc\kei\design_agent\tests\matching\v4_full32_result.yaml' -Pattern '^ 05|^05|05-'Get-Content -LiteralPath 'D:\ad-hoc\kei\design_agent\tests\matching\v4_full32_result.yaml' -TotalCount 120tests/test_phase_z2_v4_fallback.py:442-558step05_v4_evidence.json,step06_composition_plan.json,step12_ai_repair.json,step20_slide_status.jsonSelect-String -LiteralPath 'D:\ad-hoc\kei\design_agent\data\runs\mdx05_imp85_smoke_721a80a1\phase_z2\final.html' -Pattern 'data-template-id|zone--provisional|needs-adaptation|__empty__' -Context 0,1rg -n "def compute_slide_status|provisional_first_render_count|full_mdx_coverage|overall" src\phase_z2_pipeline.py tests\test_phase_z2_imp30_first_render.py tests\test_phase_z2_v4_fallback.pyFiles checked:
D:\ad-hoc\kei\design_agent\src\phase_z2_pipeline.pyD:\ad-hoc\kei\design_agent\tests\matching\v4_full32_result.yamlD:\ad-hoc\kei\design_agent\tests\test_phase_z2_v4_fallback.pyD:\ad-hoc\kei\design_agent\tests\test_phase_z2_imp30_first_render.pyD:\ad-hoc\kei\design_agent\data\runs\mdx05_imp85_smoke_721a80a1\phase_z2\steps\step05_v4_evidence.jsonD:\ad-hoc\kei\design_agent\data\runs\mdx05_imp85_smoke_721a80a1\phase_z2\steps\step06_composition_plan.jsonD:\ad-hoc\kei\design_agent\data\runs\mdx05_imp85_smoke_721a80a1\phase_z2\steps\step12_ai_repair.jsonD:\ad-hoc\kei\design_agent\data\runs\mdx05_imp85_smoke_721a80a1\phase_z2\steps\step20_slide_status.jsonD:\ad-hoc\kei\design_agent\data\runs\mdx05_imp85_smoke_721a80a1\phase_z2\final.htmlTest results:
Verified facts:
empty_shell, which has no AI route hint and therefore skips AI.compute_slide_status()logic.FINAL_CONSENSUS: YES
[Claude #1] Stage 2 simulation-plan — IMP-87
Locus:
compute_slide_status(src/phase_z2_pipeline.py:2939-3130) + CLI tail(L6011-6072). Defect = empty_shell-dominant slide reports overall=PASS +
full_mdx_coverage=true. Stage 1 reclassified mdx05 as Case B (zero V4
evidence). Detection signal =
unit.frame_template_id == "__empty__"(only IMP-30 u4 unit synthesis at L4023 creates it; L4640/L4670 are
zone-level, not consumed here).
Stage 1 Q resolutions: Q1=new enum EMPTY_SHELL_NO_CONTENT (not FAIL/PARTIAL);
Q2=
full_mdx_coverageredefined over content-unit cover; Q3=only 2 staleasserts (tests L1019 / L1363) — both inverted by u4. IMP-30 u6 contract
(provisional≠empty_shell, L1065-1090 uses
restructure) untouched.Out of scope: zero-V4 frame fabrication; upstream V4 regeneration; chain_exhausted
rank-1 provisional path; AI route activation for no_v4_candidate; Front_test*
mirrors.
Rollback: per-unit single-commit revert. No schema / on-disk write. final_status.html
template untouched (only
_ov_classstring mapping).=== IMPLEMENTATION_UNITS ===
summary: compute_slide_status — add
_is_empty_shell_unithelper + redefinefull_mdx_coverageover content-unit cover; route empty_shell-only sections intofiltered_section_ids(reuses existing IMP-06 override-uncovered pattern at L3019-3058).files:
tests: []
estimate_lines: 28
summary: compute_slide_status —
EMPTY_SHELL_NO_CONTENTenum branched ahead of 4-way ladder when content_unit_count==0; 5 new fields (empty_shell_unit_count / empty_shell_section_ids / content_unit_count / content_rendered_section_ids / empty_shell_dominant) mirroring fallback_selections shape;noteextended.files:
tests: []
estimate_lines: 42
summary: CLI tail explicit branch — print BLOCKED diagnostic + sys.exit(1) when overall==EMPTY_SHELL_NO_CONTENT (before partial path);
_ov_classmaps EMPTY_SHELL_* → fail (red in final_status.html).files:
tests: []
estimate_lines: 22
summary: Invert stale assertions in
test_u6_empty_shell_unit_listed_with_empty_identifiers(L1019) andtest_u7_e2e_zero_v4_empty_shell_status_surface(L1362-1363) — both encode the defect; add IMP-87 anchor citing Stage 1 commit_shac53722ad.files:
tests:
estimate_lines: 26
summary: New file with 3 honesty regression tests — (a) empty-shell-dominant → EMPTY_SHELL_NO_CONTENT + full_mdx_coverage=False; (b) mixed (1 content + 1 empty_shell) → PARTIAL_COVERAGE with empty section in filtered_ids; (c) normal content-only path stays PASS (regression guard).
files:
tests:
estimate_lines: 48
Per-unit rationale:
__empty__checks; coverage redefinition reuses existing filtered_ids mutation pattern.feedback_artifact_status_naming; anchor comment cites Stage 1 lock so the flip's rationale is discoverable.feedback_scope_qualified_verification.=== EVIDENCE ===
restructure, not empty_shell — untouched).__empty__constructor.FINAL_CONSENSUS: YES
[Codex #1] Stage 2 simulation-plan review - IMP-87
=== IMPLEMENTATION_UNITS ===
units:
verdict: complete
summary: Add empty-shell unit detection and make full_mdx_coverage count content-unit coverage, not placeholder coverage.
files:
tests: []
estimate_lines: 28
verdict: complete
summary: Add EMPTY_SHELL_NO_CONTENT status, empty/content counters, section lists, dominance flag, and note text.
files:
tests: []
estimate_lines: 42
verdict: complete
summary: Add CLI/reporting handling for EMPTY_SHELL_NO_CONTENT and map it to failing final_status styling.
files:
tests: []
estimate_lines: 22
verdict: complete
summary: Invert the two stale IMP-30 assertions that currently encode empty_shell as PASS/full coverage.
files:
tests:
estimate_lines: 26
verdict: complete
summary: Add IMP-87 regression tests for empty-only, mixed empty/content, and normal content-only status behavior.
files:
tests:
estimate_lines: 48
Per-unit rationale:
Out-of-scope notes:
Rollback strategy:
Revert the Stage 3 commit touching only src/phase_z2_pipeline.py, tests/test_phase_z2_imp30_first_render.py, and tests/test_phase_z2_imp87_empty_shell_honesty.py. No schema migration or generated run artifact is part of the planned commit.
=== EVIDENCE ===
Commands run:
Files checked:
Test results:
Verified facts:
c53722ad0b.FINAL_CONSENSUS: YES
⚠️ [Orchestrator] Stage 2 FINAL_CONSENSUS: YES rejected: unit with
tests: [](forbidden — implementation + tests = same unit).Requirements (strict):
=== IMPLEMENTATION_UNITS ===block with at least one- id: u<N>entrytests: [<path>, ...](NOTtests: [])Example:
=== IMPLEMENTATION_UNITS ===
summary: ...
files: [...]
tests: [tests/.../test_xxx.py]
estimate_lines:
[Claude #2] Stage 2 simulation-plan — IMP-87
Round #2 lock. Codex #1 FINAL_CONSENSUS: YES with per-unit verdict=complete on all 5 units. No deltas — units / files / line budgets unchanged. Scope = Stage 1 honesty-defect reframe (Case B), NOT issue-body Case A reroute.
=== IMPLEMENTATION_UNITS ===
summary: compute_slide_status — add
_is_empty_shell_unithelper (detectsframe_template_id == "__empty__"); redefinefull_mdx_coverageover content-unit cover; route empty-shell-only aligned sections intofiltered_section_ids(reuses IMP-06 override-uncovered pattern).files: [src/phase_z2_pipeline.py]
tests: []
estimate_lines: 28
summary: compute_slide_status — add
EMPTY_SHELL_NO_CONTENTenum branched ahead of the 4-way overall ladder when content_unit_count==0; surface 5 new fields (empty_shell_unit_count / empty_shell_section_ids / content_unit_count / content_rendered_section_ids / empty_shell_dominant) mirroringfallback_selectionsshape; extendnote.files: [src/phase_z2_pipeline.py]
tests: []
estimate_lines: 42
summary: CLI tail — BLOCKED diagnostic + sys.exit(1) when overall==EMPTY_SHELL_NO_CONTENT (before partial path);
_ov_classmaps EMPTY_SHELL_* → fail so final_status.html renders red.files: [src/phase_z2_pipeline.py]
tests: []
estimate_lines: 22
summary: Invert two stale defect-encoding asserts —
test_u6_empty_shell_unit_listed_with_empty_identifiers(L1018–1019) andtest_u7_e2e_zero_v4_empty_shell_status_surface(L1362–1363). Add IMP-87 anchor citing Stage 1 SHAc53722ad. IMP-30 u6 mixed test (L1025+) NOT touched.files: [tests/test_phase_z2_imp30_first_render.py]
tests: [tests/test_phase_z2_imp30_first_render.py]
estimate_lines: 26
summary: New honesty regression suite — (a) Case B empty-shell-only → EMPTY_SHELL_NO_CONTENT + full_mdx_coverage=False; (b) Case C mixed (1 content + 1 empty_shell) → PARTIAL_COVERAGE with empty section listed in filtered_section_ids; (c) regression guard — normal content-only path still PASS.
files: [tests/test_phase_z2_imp87_empty_shell_honesty.py]
tests: [tests/test_phase_z2_imp87_empty_shell_honesty.py]
estimate_lines: 48
Per-unit rationale:
__empty__literals; reuses IMP-06 filtered_section_ids vocabulary so downstream already understands.fallback_selectionsso dashboards adopt with no schema migration.feedback_scope_qualified_verification.Out of scope:
lookup_v4_match_with_fallback()untouched; IMP-30 u1+u3 chain_exhausted path preserved.src/phase_z2/steps/step20_slide_status.py— that path does not exist; actual locus issrc/phase_z2_pipeline.py:compute_slide_status(L2939) + CLI tail. Corrected.Rollback: single per-unit commit revert against
src/phase_z2_pipeline.py+ two test files. No schema / no run-artifact rewrite.=== EVIDENCE ===
Commands run:
c53722ad0b(matches Stage 1 commit_sha)def compute_slide_statussrc/phase_z2_pipeline.py → L2939__empty__|empty_shellsrc/phase_z2_pipeline.py → IMP-30 u4 synthesis at L4020–4046 (frame_template_id="empty", label="empty_shell", phase_z_status="empty_shell")assert status["full_mdx_coverage"] is True+assert status["overall"] == "PASS"with self-documenting comment L1015–1017overallfield assembled at L3121,noteat L3122–3129, no EMPTY_SHELL enum yetFiles verified: src/phase_z2_pipeline.py, tests/test_phase_z2_imp30_first_render.py exist. tests/test_phase_z2_imp87_empty_shell_honesty.py to be created (does not exist — correct for u5).
FINAL_CONSENSUS: YES
[Codex #2] Stage 2 simulation-plan review - IMP-87
=== IMPLEMENTATION_UNITS ===
units:
verdict: complete
summary: Add empty-shell unit detection and redefine full_mdx_coverage over rendered content units, with empty-shell-only aligned sections surfaced in filtered_section_ids.
files:
tests: []
estimate_lines: 28
verdict: complete
summary: Add EMPTY_SHELL_NO_CONTENT before the existing overall ladder, plus empty/content counters, section lists, dominance flag, and note text.
files:
tests: []
estimate_lines: 42
verdict: complete
summary: Add CLI/report handling so EMPTY_SHELL_NO_CONTENT prints a blocked diagnostic, exits nonzero, and maps to failing final_status styling.
files:
tests: []
estimate_lines: 22
verdict: complete
summary: Invert two stale IMP-30 assertions that encode the defect for empty_shell as PASS/full coverage, without touching the mixed provisional test.
files:
tests:
estimate_lines: 26
verdict: complete
summary: Add IMP-87 honesty regressions for Case B empty-shell-only, Case C mixed empty/content, and normal content-only PASS behavior.
files:
tests:
estimate_lines: 48
Per-unit rationale:
Out-of-scope notes:
Rollback strategy:
Revert the Stage 3 commit touching only src/phase_z2_pipeline.py, tests/test_phase_z2_imp30_first_render.py, and tests/test_phase_z2_imp87_empty_shell_honesty.py. No generated artifacts or schema changes are planned.
=== EVIDENCE ===
Commands run:
Files checked:
Test results:
Commit SHA:
c53722ad0bVerified facts:
FINAL_CONSENSUS: YES
[Claude #1] Stage 3 code-edit — IMP-87
Executing unit: u1
=== UNIT DELIVERABLE ===
unit_executed: u1
unit_summary (from Stage 2 plan):
files_changed:
src/phase_z2_pipeline.py(+34 / −2)diff_summary:
New helper
_is_empty_shell_unit(u: CompositionUnit) -> boolinserted betweenwrite_overflow_errorandcompute_slide_status(anchor: just above L2939 docstring block). Returns True when any offrame_template_id == "__empty__",label == "empty_shell",merge_type == "empty_shell"holds. Usesgetattrdefensively so legacy units from pre-IMP-30 paths can't crash the check. Docstring cross-referencesfeedback_artifact_status_naming(overall/coverage must reflect actual content state, not pipeline completion).Coverage split inside
compute_slide_status(formerly L2966–2971):covered: set(legacy) — still tracks every section attached to any selected unit; kept socovered_section_idsdisplay field is unchanged (out-of-scope for u1).content_covered: set(new) — only non-empty-shell units count, via_is_empty_shell_unitgate inside the for-loop.filtered_ids = sorted(set(aligned_ids) - content_covered)— empty-shell-only sections now appear infiltered_section_ids.full_coverage = len(filtered_ids) == 0— same expression, but evaluated against the new content-basedfiltered_ids, sofull_mdx_coverage=Truerequires every aligned section to be cover by a content unit.verification (u1-scope only, no pytest yet — u4/u5 cover that):
git diff --stat→ 1 file, +34 / −2 (estimate: 28). Within plan budget.src/phase_z2_pipeline.pyonly (Stage 2 u1 file list = exactly this file).full_coverage = len(filtered_ids) == 0after appendingsection_assignment_uncovered_ids. With the newfiltered_idsbase (content_covered), this re-eval remains correct — override-uncovered ids still get appended; full_coverage stays consistent.covered_section_idsoutput field at L3104 still uses legacycovered(unchanged display semantics — u1 scope isfull_mdx_coverage+filtered_section_idsonly).overallenum ladder at L3060–3067 is untouched (u2 territory) — for an EMPTY-SHELL-only run it will now route throughPARTIAL_COVERAGE/PARTIAL_COVERAGE_WITH_VISUAL_REGRESSIONinstead ofPASS. u2 will replace this with the explicitEMPTY_SHELL_NO_CONTENTenum.remaining_units: [u2, u3, u4, u5]
follow_up_issue_candidates: none — no scope-lock-외 axis observed during u1 implementation. The downstream cascade behaves cleanly with the existing 4-way ladder until u2 lands the explicit enum.
guardrails honored:
filtered_section_idsis a self-honest reporting axis.full_mdx_coveragesemantics +filtered_section_idsrouting.not done (deferred to remaining units):
EMPTY_SHELL_NO_CONTENTenum + empty/content counters + section lists + dominance flag + note text.sys.exit(1)+_ov_classred mapping.test_phase_z2_imp30_first_render.py.test_phase_z2_imp87_empty_shell_honesty.py(Case B / Case C / normal PASS).no commit / no push — Stage 3 unit deliverable, per protocol.
[Codex #1] Stage 3 code-edit verification - IMP-87
Verified the single executed unit u1 against the Stage 2 contract.
Decision: u1 is correct. The implementation adds
_is_empty_shell_unit()and changescompute_slide_status()sofull_mdx_coverageandfiltered_section_idsare based on non-empty-shell content coverage, while preserving legacycovered_section_idsdisplay semantics. This matches the u1 scope and does not introduce rank-1 reject promotion, AI activation, V4 fabrication, or chain-exhausted behavior changes.Notes:
src/phase_z2_pipeline.pyis scoped to the helper plus the coverage split insidecompute_slide_status().filtered_ids = sorted(set(aligned_ids) - content_covered)now routes empty-shell-only aligned sections intofiltered_section_ids.filtered_idsand re-evaluatesfull_coverage, so the IMP-06 invariant remains consistent with the new content-coverage base.EMPTY_SHELL_NO_CONTENTstatus..claude/settings.json; they are not part of the verified u1 diff.=== EVIDENCE ===
Commands run:
git diff -- src/phase_z2_pipeline.pygit diff --statgit status --shortrg -n "filtered_ids|full_coverage|covered_section_ids|overall" src/phase_z2_pipeline.pyGet-Content -LiteralPath 'src\\phase_z2_pipeline.py' | Select-Object -Skip 2928 -First 190python -m py_compile src/phase_z2_pipeline.pyFiles checked:
D:\ad-hoc\kei\design_agent\src\phase_z2_pipeline.pyTest results:
python -m py_compile src/phase_z2_pipeline.pypassed.Verified facts:
_is_empty_shell_unit()returns true forframe_template_id == "__empty__",label == "empty_shell", ormerge_type == "empty_shell"using defensivegetattr.coveredstill updates from every unit and still feedscovered_section_ids.content_coveredupdates only for units where_is_empty_shell_unit(u)is false.filtered_section_idsandfull_mdx_coveragenow derive fromcontent_covered, so empty-shell-only sections no longer count as full MDX content coverage.Remaining units: [u2, u3, u4, u5]
rewind_target: continue_same
FINAL_CONSENSUS: NO
[Claude #2] Stage 3 code-edit — IMP-87
Executing unit: u2
=== UNIT DELIVERABLE ===
unit_executed: u2
unit_summary (from Stage 2 plan):
files_changed:
src/phase_z2_pipeline.py(+40 / −2 for u2 only; cumulative u1+u2 = +74 / −4)diff_summary:
Empty-shell vs content unit accounting block inserted in
compute_slide_statusjust before the legacy 4-way overall enum ladder (anchor: between the IMP-06 override-uncovered append loop and theif full_coverage and visual_passed:head). The block:_is_empty_shell_unit(IMP-87 u1 helper) →empty_shell_unitsandcontent_units.empty_shell_section_idsandcontent_rendered_section_idsas sorted sets ofgetattr(u, "source_section_ids", []) or [](defensive — same defensive pattern u1 used so legacy units cannot crash the function).empty_shell_dominant = (len(content_units) == 0) and (len(empty_shell_units) >= 1)— the explicit Case B honesty surface (mdx05: every selected unit is the IMP-30 u4 placeholder).Overall enum ladder extended with a new branch placed BEFORE the existing 4-way head:
empty_shell_dominantshort-circuits regardless ofvisual_passedbecause placeholder zones never overflow → the legacy ladder would have routed mdx05 through PASS (the exact Stage 1 honesty defect). New enum value isEMPTY_SHELL_NO_CONTENT. The other four ladder values (PASS,RENDERED_WITH_VISUAL_REGRESSION,PARTIAL_COVERAGE,PARTIAL_COVERAGE_WITH_VISUAL_REGRESSION) are unchanged.Return dict gains five additive surface fields, grouped next to the IMP-30 u6 provisional_first_render_* fields for read symmetry (Stage 2 plan named these explicitly):
empty_shell_unit_count(int)empty_shell_section_ids(sorted list[str])content_unit_count(int)content_rendered_section_ids(sorted list[str])empty_shell_dominant(bool)Schema is additive only — no field reshape, no rename, no removal. Existing consumers (u3 CLI tail, status board UI, regression tests) can pick them up without migration. Field-naming mirrors
provisional_first_render_count+fallback_selection_countshape per Stage 2 contract.Note text extended with one trailing sentence documenting the new enum semantics:
Korean per existing note language; cross-references
feedback_artifact_status_naminghonesty axis via the "PASS 보고 금지" phrase.verification (u2-scope only, no pytest yet — u4/u5 cover that):
python -m py_compile src/phase_z2_pipeline.py→ OK.git diff --stat src/phase_z2_pipeline.py→ +74 / −4 cumulative; u2-only delta = +40 / −2.src/phase_z2_pipeline.pyonly — Stage 2 u2 file list = exactly this file.full_mdx_coverage(IMP-87 u1) andfiltered_section_ids(IMP-87 u1) are unchanged in u2 — they still reflect content-cover from u1'scontent_covered. The new dominance flag is computed from selected-unit shape, not from the coverage set, so there is no double-counting.full_coverageat L3065 (now post-u1 base) runs BEFORE the u2 ladder branch — so an override-uncovered list still correctly downgrades a non-empty-shell slide throughPARTIAL_COVERAGE, never the new enum.provisional_first_render_countsemantics untouched. A mdx05 EMPTY-SHELL-only slide will now report BOTHprovisional_first_render_count >= 1(because the empty-shell placeholder is provisional) ANDoverall = "EMPTY_SHELL_NO_CONTENT"— consistent, not contradictory, per Stage 1 reframe.remaining_units: [u3, u4, u5]
follow_up_issue_candidates: none — no scope-lock-外 axis observed during u2 implementation. The CLI tail / exit code wiring for
EMPTY_SHELL_NO_CONTENTis u3's territory, regression tests are u4/u5's, and Step 12 AI activation remains out-of-scope per Stage 1 reframe.guardrails honored:
python -m py_compileonly; pytest regression deferred to u4/u5 per Stage 2 plan.[Codex #2] Stage 3 code-edit verification - IMP-87
Verified the single executed unit u2 against the Stage 2 contract.
Decision: u2 is correct. The implementation adds the explicit
EMPTY_SHELL_NO_CONTENTbranch before the legacy 4-way overall ladder and exposes the planned additive empty-shell/content accounting fields incompute_slide_status(). It keeps the change scoped to Step 20 status honesty and does not introduce rank-1 reject promotion, Step 12 AI activation, V4 evidence fabrication, or chain-exhausted behavior changes.Notes:
_is_empty_shell_unit()intoempty_shell_unitsandcontent_units.empty_shell_section_idsandcontent_rendered_section_idsare derived defensively fromsource_section_ids.empty_shell_dominantis true only when there is at least one empty-shell unit and zero content units.overall = "EMPTY_SHELL_NO_CONTENT"is evaluated beforePASS, so an empty-shell-only slide cannot look shippable even when visual overflow passes.empty_shell_unit_count,empty_shell_section_ids,content_unit_count,content_rendered_section_ids, andempty_shell_dominant..claude/settings.json; they are not part of the verified u2 diff.=== EVIDENCE ===
Commands run:
git diff -- src/phase_z2_pipeline.pygit diff --statgit status --shortrg -n "EMPTY_SHELL_NO_CONTENT|empty_shell_unit_count|empty_shell_section_ids|content_unit_count|content_rendered_section_ids|empty_shell_dominant|def _is_empty_shell_unit|overall" src/phase_z2_pipeline.pyGet-Content -LiteralPath 'src\\phase_z2_pipeline.py' | Select-Object -Skip 3084 -First 120rg -n "lookup_v4_match_with_fallback|chain_exhausted|route_not_ai_adaptation|ai_called|EMPTY_SHELL_NO_CONTENT" src/phase_z2_pipeline.pypython -m py_compile src/phase_z2_pipeline.pyFiles checked:
D:\ad-hoc\kei\design_agent\src\phase_z2_pipeline.pyTest results:
python -m py_compile src/phase_z2_pipeline.pypassed.Verified facts:
EMPTY_SHELL_NO_CONTENTprecedes the existingPASS/ visual regression / partial coverage ladder.empty_shell_dominantis computed from selected-unit partitions, not from sample-specific section IDs.src/phase_z2_pipeline.py; no generated artifacts or tests were modified by this unit.Remaining units: [u3, u4, u5]
rewind_target: continue_same
FINAL_CONSENSUS: NO
[Claude #3] Stage 3 code-edit — IMP-87
Executing unit: u3
=== UNIT DELIVERABLE ===
unit_executed: u3
unit_summary (from Stage 2 plan):
files_changed:
src/phase_z2_pipeline.py(+38 / −1 for u3 only; cumulative u1+u2+u3 = +114 / −5)diff_summary:
_ov_classmapping update (anchor:run_phase_z2_mvp1Step 20 HTML block, around former L6002). The single-line ternary is expanded into an explicit if/elif/else chain. New branch routesEMPTY_SHELL_NO_CONTENTinto the existing"fail"class (alongsideFAIL/REGRESSIONsubstring matches). Comment cross-references u2's enum and notes that placeholder-only slides must render red even when visual_check passes, matching the CLI BLOCKED exit u3 also adds.PASS/partialbranches unchanged. Schema additive — no removed class.EMPTY_SHELL_NO_CONTENTBLOCKED branch inserted between the visual-fail abort (former L6121–6127) and the partial-coverage informational block (former L6129). Placement is deliberate: empty-shell placeholders never overflow, so the visual-fail abort never fires for Case B; without this new branch the run would fall through into thePARTIALinformational block and returnout_path(exit 0). The block:[Phase Z-2 IMP-87 u3] BLOCKED @ empty_shell_no_content (<overall>)header to stderr.empty_shell_unit_count,empty_shell_section_ids,content_unit_count. (Uses.get(...)defensively so a missing field can't crash the diagnostic.)reasonexplaining "every selected unit is empty_shell placeholder — final.html 은 placeholder shell, ship 불가" (Korean phrasing per existing CLI tail language).sys.exit(1)— same exit code as the visual-fail abort, so CI / wrapper scripts treat empty-shell-only and visual-fail identically (both = unshippable).No other behavior change in u3. The visual-fail abort, partial-coverage informational block, and PASS branch are untouched. The new BLOCKED branch only fires when
overall == "EMPTY_SHELL_NO_CONTENT"— an enum value that exists exclusively when u2'sempty_shell_dominantis True (zero content units + ≥1 empty-shell unit). All other overall values continue routing through the legacy three-branch tail.verification (u3-scope only, no pytest yet — u4/u5 cover that):
python -m py_compile src/phase_z2_pipeline.py→ passed.git diff --stat src/phase_z2_pipeline.py→ 1 file, +114 / −5 cumulative (u1=+34/−2, u2=+40/−2, u3=+38/−1; over the u3=22-line Stage 2 estimate by ~16 lines, but all overflow is comments/docstring + explicitprint(...)calls in the BLOCKED branch — no scope expansion beyond the planned three-effect surface).src/phase_z2_pipeline.pyonly (Stage 2 u3 file list = exactly this file).slide_status["visual_check_passed"]= True (placeholder zones don't overflow) → first abort skipped →overall == "EMPTY_SHELL_NO_CONTENT"matches new BLOCKED branch → exit(1). PASS / PARTIAL message paths never reached. ✓overall == "PASS"→ BLOCKED branch skipped → falls through to the finalprint("PASS — visual check OK + full MDX coverage")and returnsout_path. ✓empty_shell_dominant= False (content_units > 0) →overallroutes toPASSorPARTIAL_COVERAGEvia the legacy ladder → BLOCKED branch skipped → existing partial-coverage message handles it. ✓overall == "PARTIAL_COVERAGE"→ BLOCKED branch skipped → existing partial-coverage informational message fires and returns. ✓_ov_classmapping verified:EMPTY_SHELL_NO_CONTENT→failclass sofinal_status.htmlrenders red consistently with the CLI exit code. No collision withPASS(substring check ordered first) or partial.remaining_units: [u4, u5]
follow_up_issue_candidates: none — no scope-lock-外 axis observed during u3 implementation. The Step 20 HTML body fields (rendered/visual_check_passed/full_mdx_coverage/...) intentionally do not yet expose the five new u2 counters; that surface lives in
step20_slide_status.jsonand is reachable by u5 tests without HTML changes. Adding the counters to the HTML body would be additive but is out of u3 scope per the Stage 2 plan's "CLI BLOCKED diagnostic + sys.exit(1) + final_status styling" wording.guardrails honored:
final_status.htmlred + CLI nonzero exit + stderrBLOCKEDheader all agree withoverall == EMPTY_SHELL_NO_CONTENT— three-axis honesty.EMPTY_SHELL_NO_CONTENTenum value defined by u2.=== EVIDENCE ===
Commands run:
python -m py_compile src/phase_z2_pipeline.pygit diff --stat src/phase_z2_pipeline.pygit diff src/phase_z2_pipeline.pyFiles checked:
D:\ad-hoc\kei\design_agent\src\phase_z2_pipeline.py(Step 20 HTML block + CLI tail, L5985–L6175 area)Test results:
python -m py_compile src/phase_z2_pipeline.pypassed.Verified facts:
_ov_classnow classifiesEMPTY_SHELL_NO_CONTENTasfail(HTML red), preserving existingPASS/REGRESSION/FAILmappings.sys.exit(1)— same code as the visual-fail abort.empty_shell_unit_count,empty_shell_section_ids,content_unit_count) viaslide_status.get(...)— defensive against missing fields.src/phase_z2_pipeline.py; no generated artifacts, tests, schemas, or wrappers were modified.Remaining units: [u4, u5]
[Codex #3] Stage 3 code-edit verification - IMP-87
Verified the single executed unit u3 against the Stage 2 contract.
Decision: u3 is correct. The implementation maps
EMPTY_SHELL_NO_CONTENTto failingstep20_final_status.htmlstyling and adds a CLI BLOCKED branch that exits nonzero before the partial-coverage return. This matches the u3 scope and does not introduce rank-1 reject promotion, Step 12 AI activation, V4 evidence fabrication, or chain-exhausted behavior changes.Notes:
_ov_classnow routesEMPTY_SHELL_NO_CONTENTtofail, so placeholder-only slides render as red/failing in the final status report instead of partial.empty_shell_unit_count,empty_shell_section_ids, andcontent_unit_countdefensively viaslide_status.get(...).sys.exit(1), so empty-shell-only runs cannot complete with exit code 0..claude/settings.json; they are not part of the verified u3 diff.=== EVIDENCE ===
Commands run:
git diff -- src/phase_z2_pipeline.pyrg -n "EMPTY_SHELL_NO_CONTENT|def _ov_class|BLOCKED @ empty_shell_no_content|empty_shell_unit_count|empty_shell_section_ids|content_unit_count|sys\.exit\(1\)" src/phase_z2_pipeline.pygit status --shortGet-Content -LiteralPath 'src\\phase_z2_pipeline.py' | Select-Object -Skip 5988 -First 190python -m py_compile src/phase_z2_pipeline.pygit diff --stat -- src/phase_z2_pipeline.pyFiles checked:
D:\ad-hoc\kei\design_agent\src\phase_z2_pipeline.pyTest results:
python -m py_compile src/phase_z2_pipeline.pypassed.Verified facts:
EMPTY_SHELL_NO_CONTENTis included in final-status class mapping asfail.if overall == "EMPTY_SHELL_NO_CONTENT"branch appears beforeif not slide_status["full_mdx_coverage"], preventing the previous partial/exit-zero path.slide_statuswithout assuming the keys always exist.sys.exit(1).src/phase_z2_pipeline.py; no generated artifacts or tests were modified by this unit.Remaining units: [u4, u5]
rewind_target: continue_same
FINAL_CONSENSUS: NO
[Claude #4] Stage 3 code-edit — IMP-87
Executing unit: u4
=== UNIT DELIVERABLE ===
unit_executed: u4
unit_summary (from Stage 2 plan):
files_changed:
tests/test_phase_z2_imp30_first_render.py(+16 / −9; net +7)diff_summary:
L1018–1019 inversion inside
test_u6_empty_shell_unit_listed_with_empty_identifiers(u6 case 3, single-unit compute_slide_status path):L1362–1363 inversion inside
test_u7_e2e_zero_v4_empty_shell_status_surface(u7 case 2, plan_composition → empty-shell synthesis → _compute_slide_status e2e path):Scope lock honored — what u4 did NOT touch:
test_u6_mixed_selection_counts_only_provisional_unitsat L1025+ (mixed provisional + normal units case) — Stage 2 plan explicitly said "without touching the mixed provisional test". Untouched. This test continues to assert the existing PASS path for runs that have at least one content unit alongside a provisional one.src/phase_z2_pipeline.py— u4 is test-only.verification (u4-scope only):
git diff --stat tests/test_phase_z2_imp30_first_render.py→ 1 file, +16 / −9 (estimate: 26 — comfortably under budget).test_u6_mixed_selection_counts_only_provisional_unitsis among them, confirming the mixed provisional + normal selection path still routes through PASS (content_unit_count >= 1 → empty_shell_dominant=False → falls through to the legacy ladder).tests/test_phase_z2_imp30_first_render.pyonly (Stage 2 u4 file list = exactly this file).c53722adcross-reference per RULE 13 (anchor sync).remaining_units: [u5]
follow_up_issue_candidates: none — no scope-lock-外 axis observed during u4 implementation. The two stale assertions were the only callers in this file that hardcoded the Case B PASS/full_coverage defect; no other test in the file (or in the broader test suite scanned via the same expressions in this run) needs inversion.
guardrails honored:
c53722adso future readers can trace the inversion back to the honesty defect root cause.is True/== "PASS"assertions flip tois False/== "EMPTY_SHELL_NO_CONTENT").Cumulative IMP-87 Stage 3 scope after u4:
src/phase_z2_pipeline.py(+114 / −5 from u1+u2+u3) +tests/test_phase_z2_imp30_first_render.py(+16 / −9 from u4). u5 will add the new honesty regression file attests/test_phase_z2_imp87_empty_shell_honesty.py.DO NOT commit or push — Stage 3 unit deliverable only; Stage 4 verify owns the test-suite-wide regression confirmation and Stage 5 owns commit/push.
[Codex #4] Stage 3 code-edit verification - IMP-87
Verified the single executed unit u4 against the Stage 2 contract.
Decision: u4 is correct. The implementation inverts the two stale IMP-30 assertions that previously encoded empty-shell-only runs as PASS/full coverage, and it leaves the mixed provisional/content test untouched. This matches the u4 scope and does not introduce rank-1 reject promotion, Step 12 AI activation, V4 evidence fabrication, or chain-exhausted behavior changes.
Notes:
test_u6_empty_shell_unit_listed_with_empty_identifiersnow expectsfull_mdx_coverage is Falseandoverall == "EMPTY_SHELL_NO_CONTENT".test_u7_e2e_zero_v4_empty_shell_status_surfacenow expects the same honesty result through the e2e_compute_slide_statuspath.test_u6_mixed_selection_counts_only_provisional_unitsremains unchanged, preserving the normal content-present behavior for a mixed selection..claude/settings.json; they are not part of the verified u4 diff.=== EVIDENCE ===
Commands run:
git diff -- tests/test_phase_z2_imp30_first_render.pyrg -n "IMP-87 u4 inversion|full_mdx_coverage|EMPTY_SHELL_NO_CONTENT|test_u6_empty_shell_unit_listed_with_empty_identifiers|test_u7_e2e_zero_v4_empty_shell_status_surface|test_u6_mixed_selection_counts_only_provisional_units" tests/test_phase_z2_imp30_first_render.pygit status --shortGet-Content -LiteralPath 'tests\\test_phase_z2_imp30_first_render.py' | Select-Object -Skip 1000 -First 55Get-Content -LiteralPath 'tests\\test_phase_z2_imp30_first_render.py' | Select-Object -Skip 1348 -First 35pytest -q tests/test_phase_z2_imp30_first_render.py::test_u6_empty_shell_unit_listed_with_empty_identifiers tests/test_phase_z2_imp30_first_render.py::test_u7_e2e_zero_v4_empty_shell_status_surfaceFiles checked:
D:\ad-hoc\kei\design_agent\tests\test_phase_z2_imp30_first_render.pyTest results:
pytest -q tests/test_phase_z2_imp30_first_render.py::test_u6_empty_shell_unit_listed_with_empty_identifiers tests/test_phase_z2_imp30_first_render.py::test_u7_e2e_zero_v4_empty_shell_status_surfacepassed: 2 passed in 0.07s.Verified facts:
tests/test_phase_z2_imp30_first_render.py.EMPTY_SHELL_NO_CONTENT/no full coverage.Remaining units: [u5]
rewind_target: continue_same
FINAL_CONSENSUS: NO
[Claude #5] Stage 3 code-edit — IMP-87
Executing unit: u5
=== UNIT DELIVERABLE ===
unit_executed: u5
unit_summary (from Stage 2 plan):
files_changed:
tests/test_phase_z2_imp87_empty_shell_honesty.py(+227, new file)diff_summary:
New regression file
tests/test_phase_z2_imp87_empty_shell_honesty.pycreated. No edits to any other file (Stage 2 u5 file list = exactly this one path). Targets the real productioncompute_slide_status+CompositionUnit+MdxSection(not stubs) so the IMP-87 u1_is_empty_shell_unithelper, u2 additive surface fields, and u2EMPTY_SHELL_NO_CONTENTladder branch are all exercised end-to-end. Naming convention: MOCK_ template ids (Codex #10 E1).Helpers (defensive minimal, mirror IMP-30 u6 _mk_section / _mk_unit shape so future drift between the two suites is visible at code-review time):
_mk_section(section_id)— minimalMdxSectionwith section_num parsed from the "S\d+" id (mirrorstests/test_phase_z2_imp30_first_render.pyL856)._mk_content_unit(section_ids, **overrides)— realCompositionUnitwithframe_template_id="MOCK_template_direct_a"/label="use_as_is"/provisional=False. Used for Case A (normal PASS) and the content half of Case C._mk_empty_shell_unit(section_ids)— mirrors IMP-30 u4 Phase B empty-shell synthesis (src/phase_z2_pipeline.py:3325~). Setsframe_template_id="__empty__"+label="empty_shell"+merge_type="empty_shell"simultaneously, so each of the three_is_empty_shell_unitbranches (IMP-87 u1) is satisfied independently — any single-branch refactor of the helper would still catch the test.Three named test suites (Stage 2 scope-qualified verification —
feedback_scope_qualified_verification):Case B (empty-shell only) —
test_case_b_empty_shell_only_routes_to_empty_shell_no_content:overflow_pass=True(placeholder zones never overflow — the exact Stage 1 honesty defect axis).empty_shell_dominant=True,overall=="EMPTY_SHELL_NO_CONTENT",full_mdx_coverage=False,empty_shell_unit_count==1,empty_shell_section_ids==["S1","S2","S3"],content_unit_count==0,content_rendered_section_ids==[],filtered_section_ids==["S1","S2","S3"].Case B visual-fail variant —
test_case_b_empty_shell_only_overrides_visual_fail_branch:overflow_pass=False. Pins that the u2 ladder branch is ordered before the legacy 4-way ladder, so an empty-shell-only run cannot inheritRENDERED_WITH_VISUAL_REGRESSIONorPARTIAL_COVERAGE_WITH_VISUAL_REGRESSIONas its headline status.overall=="EMPTY_SHELL_NO_CONTENT",empty_shell_dominant=True,visual_check_passed=False.Case C (mixed empty + content) —
test_case_c_mixed_empty_and_content_routes_to_partial_coverage:empty_shell_dominant=False(slide carries real content),overall=="PARTIAL_COVERAGE",full_mdx_coverage=False,empty_shell_unit_count==1,empty_shell_section_ids==["S2"],content_unit_count==1,content_rendered_section_ids==["S1"],filtered_section_ids==["S2"],covered_section_ids==["S1","S2"](legacy display semantics preserved — IMP-87 u1 only redefinedfull_mdx_coverage/filtered_section_ids, notcovered_section_ids).Case A (normal content-only PASS preserved) —
test_case_a_normal_content_only_preserves_pass_with_additive_fields:frame_template_id="MOCK_template_direct_a"unit as empty-shell would flip this toEMPTY_SHELL_NO_CONTENTand trip the test.overall=="PASS",full_mdx_coverage=True,filtered_section_ids==[],empty_shell_dominant=False,empty_shell_unit_count==0,empty_shell_section_ids==[],content_unit_count==2,content_rendered_section_idssorted equals["S1","S2"].verification (u5-scope only):
python -m pytest -q tests/test_phase_z2_imp87_empty_shell_honesty.py→ 4 passed in 0.07s (Case B + Case B visual-fail + Case C + Case A).python -m pytest -q tests/test_phase_z2_imp30_first_render.py tests/test_phase_z2_imp87_empty_shell_honesty.py→ 33 passed in 0.12s (29 IMP-30 incl. u4's two inverted asserts at L1018-1022 + L1363-1370 from r4, plus 4 new IMP-87 u5 cases). Cross-file regression confirmed: u4's empty-shell honesty inversion and u5's new regression scope co-exist without interference.git status --short tests/test_phase_z2_imp87_empty_shell_honesty.py→?? tests/test_phase_z2_imp87_empty_shell_honesty.py(untracked new file, no other edits leaked into the u5 unit).scope honored — what u5 did NOT do:
src/phase_z2_pipeline.py(u1+u2+u3 territory, all already complete).tests/test_phase_z2_imp30_first_render.py(u4 territory, already complete).remaining_units: [] (all Stage 2 units u1~u5 complete after this turn)
follow_up_issue_candidates: none — Stage 2 scope-lock satisfied with no out-of-axis observations during u5 implementation. The two follow-up axes already documented in Stage 2 (Case A reroute / rank-1 reject promotion, Step 12 AI activation for empty-shell rescue) remain separate-issue territory and are not blockers for IMP-87 close-out.
guardrails honored:
compute_slide_statusfunction, not patched JSON outputs.Stage 1 anchor SHA
c53722adreferenced in module docstring.[Codex #5] Stage 3 code-edit verification - IMP-87
Verified the single executed unit u5 against the Stage 2 contract.
Decision: u5 is correct. The new regression file exercises the real
compute_slide_statuspath for the planned IMP-87 honesty cases: Case B empty-shell-only, Case C mixed empty/content, and normal content-only PASS preservation. It also includes a Case B visual-fail ordering guard, which remains within the u5 honesty surface because it only verifies thatEMPTY_SHELL_NO_CONTENTprecedes the legacy visual-regression ladder.Notes:
tests/test_phase_z2_imp87_empty_shell_honesty.pyis a new test-only file and targets realCompositionUnit,MdxSection, andcompute_slide_statusobjects rather than stubs.empty_shell_dominant is True,overall == "EMPTY_SHELL_NO_CONTENT",full_mdx_coverage is False, zero content units, and all aligned sections routed intofiltered_section_ids.visual_check_passedis false.empty_shell_dominant is False, routes throughPARTIAL_COVERAGE, keeps the content section incontent_rendered_section_ids, and routes the empty-shell section intofiltered_section_ids.overall == "PASS"andfull_mdx_coverage is Truewhile checking the additive IMP-87 fields..claude/settings.json; they are not part of the verified u5 file.=== EVIDENCE ===
Commands run:
git status --shortgit diff -- tests/test_phase_z2_imp87_empty_shell_honesty.pyrg -n "Case B|Case C|normal|EMPTY_SHELL_NO_CONTENT|empty_shell_dominant|full_mdx_coverage|filtered_section_ids|content_rendered_section_ids|def _mk_|compute_slide_status" tests/test_phase_z2_imp87_empty_shell_honesty.pyrg -n "def compute_slide_status|def _is_empty_shell_unit|EMPTY_SHELL_NO_CONTENT|empty_shell_dominant|content_rendered_section_ids|filtered_section_ids" src/phase_z2_pipeline.pyGet-Content -LiteralPath 'tests\test_phase_z2_imp87_empty_shell_honesty.py'Get-Content -LiteralPath 'src\phase_z2_pipeline.py' | Select-Object -Skip 2928 -First 275pytest -q tests/test_phase_z2_imp87_empty_shell_honesty.pyFiles checked:
D:\ad-hoc\kei\design_agent\tests\test_phase_z2_imp87_empty_shell_honesty.pyD:\ad-hoc\kei\design_agent\src\phase_z2_pipeline.pyTest results:
pytest -q tests/test_phase_z2_imp87_empty_shell_honesty.pypassed: 4 passed in 0.07s.Verified facts:
_is_empty_shell_unit/compute_slide_statussurface and assert the u2 fieldsempty_shell_unit_count,empty_shell_section_ids,content_unit_count,content_rendered_section_ids, andempty_shell_dominant.EMPTY_SHELL_NO_CONTENTwith no full MDX coverage; mixed empty/content is partial coverage, not empty-shell dominant; normal content-only remains PASS.tests/test_phase_z2_imp87_empty_shell_honesty.py.Remaining units: none
FINAL_CONSENSUS: YES
[Claude #1] Stage 4 test-verify — IMP-87
TL;DR
Targeted suites (u4 + u5) pass. Implementation u1+u2+u3 in
src/phase_z2_pipeline.pymatches Stage 2 contract end-to-end (realcompute_slide_status+ CLI BLOCKED). However, broader regression caught a third stale assertion that Stage 2 plan did not enumerate:tests/test_pipeline_smoke_imp85.py:79assertscp.returncode == 0for mdx05, which IMP-87 u3 (BLOCKED onEMPTY_SHELL_NO_CONTENT) deliberately inverts to exit 1. This is the same class of "stale assertion encoding the Case B honesty defect" as the two IMP-30 inversions but in a file Stage 2 did not list.This is a plan-completeness gap (missing file in test inventory), not a code-edit error. Stage 3 cleanly implements every Stage 2 unit; Stage 2 just didn't enumerate all the stale assertions. Per rewind enum:
stage_2_plan(plan wrong / missing files / tests).Verification Table
pytest tests/test_phase_z2_imp87_empty_shell_honesty.py -vcompute_slide_statuspytest tests/test_phase_z2_imp30_first_render.py -vtest_u6_empty_shell_unit_listed_with_empty_identifiers(L1015~) andtest_u7_e2e_zero_v4_empty_shell_status_surface(L1363~) inverted from PASS→EMPTY_SHELL_NO_CONTENTpytest tests/test_phase_z2_pipeline_imp48.py tests/test_phase_z2_max_rank_regression.py tests/test_dynamic_max_rank.py -qpytest tests/ -q --ignore=tests/matching --ignore=tests/pipeline -xtest_pipeline_smoke_imp85.py::test_non_vp_smoke_runs_clean[05.mdx-mdx05]— stalereturncode == 0assertion not in Stage 2 planDiff ↔ Stage 2 Plan Conformance
Hardcoding Check (RULE 7, feedback_no_hardcoding)
src/phase_z2_pipeline.pydiffS1/S2/S3section ids andMOCK_template_direct_a/MOCK_frame_001template/frame ids — sample-agnostic_mk_empty_shell_unitmirrors the IMP-30 u4 Phase B synthesis shape fromsrc/phase_z2_pipeline.py:3325~(canonical reference, not duplicated logic)Stage 2 Plan Gap (Rewind Reason)
Stage 2 Exit Report §
unit_executed_fileslisted onlytests/test_phase_z2_imp30_first_render.py(u4) as the file with stale assertions to invert. It did not enumeratetests/test_pipeline_smoke_imp85.py:79, even though that smoke test'scp.returncode == 0assertion encodes the same Case B honesty defect at the subprocess level — written when mdx05 silently passed (commitcacc5b3, IMP-85 u7, pre-IMP-87).After Stage 3 u3 lands the CLI BLOCKED branch, the subprocess correctly exits 1 with the IMP-87 u3 BLOCKED diagnostic in stderr. The smoke test's stale expectation now fails. Stage 2 should have either:
returncode == 0toreturncode == 1AND assert"[Phase Z-2 IMP-87 u3] BLOCKED"appears in stderr, OR_run_pipelineparametrize list (keep only mdx03) and added the BLOCKED assertion as a new test_case_b_blocked_exit smokeNeither was enumerated. The gap is plan-side, not code-side — Stage 3 cleanly delivered every Stage 2 unit (u1-u5).
Suggested Stage 2 Round #3 Delta
Add to
implementation_units:Per-unit rationale: u6 mirrors u4 (stale-assertion inversion) but at the subprocess smoke level. The pre-IMP-87 assertion was written under the silent-PASS contract; IMP-87 u3 inverts that contract. Without u6 the broader regression run cannot exit 0.
Implementation hint (for Stage 3 R6 when reached):
returncode==0and mdx05 gets its own test assertingreturncode==1+"BLOCKED"substring in stderrAnchor: this comment SHA (Stage 4 R1) and Stage 2 R2 commit
c53722ad.Out of Scope for Stage 2 R3
data-template-idsurface change (still__empty__for empty-shell zone; that's a separate axis tracked in follow_up_candidates)Guardrails Confirmed
feedback_artifact_status_naming: status fields now reflect content state (EMPTY_SHELL_NO_CONTENT distinct from PASS / PARTIAL_COVERAGE / RENDERED_WITH_VISUAL_REGRESSION)feedback_ai_isolation_contract: no AI invocation, no MDX modification, no frame swapfeedback_no_hardcoding: process fix at status compute + ladder + CLI tail; no MDX 03/04/05 sample literalsfeedback_scope_qualified_verification: Case A/B/B-visual-fail/C explicitly named in u5 testsfeedback_auto_pipeline_first: no review_required/review_queue injection_is_empty_shell_unit; no AI involvement=== EVIDENCE ===
rewind_target: stage_2_plan
FINAL_CONSENSUS: NO
[Codex #1] Stage 4 test-verify - IMP-87
Stage 4 result: FAIL.
The targeted IMP-87 and IMP-30 suites pass, and the implementation diff matches the Stage 2 direction for u1-u4 in the intended files. However, independent verification reproduced a broader regression in an unplanned stale test:
tests/test_pipeline_smoke_imp85.py::test_non_vp_smoke_runs_clean[05.mdx-mdx05]still assertscp.returncode == 0for mdx05. IMP-87 u3 deliberately changes mdx05 empty-shell-only behavior to exit 1 withEMPTY_SHELL_NO_CONTENT, so this test now encodes the old honesty defect.This is not a retryable technical failure. It is a Stage 2 plan inventory gap: the plan named the two stale IMP-30 assertions but missed the IMP-85 smoke assertion that covers the same Case B mdx05 empty-shell-only path.
=== EVIDENCE ===
Commands run:
git status --shortgit diff --statgit diff --name-onlyrg -n "EMPTY_SHELL_NO_CONTENT|empty_shell_dominant|empty_shell_unit_count|content_rendered_section_ids|def test_.*empty_shell|test_non_vp_smoke_runs_clean" src\phase_z2_pipeline.py tests\test_phase_z2_imp30_first_render.py tests\test_phase_z2_imp87_empty_shell_honesty.py tests\test_pipeline_smoke_imp85.pypytest tests/test_phase_z2_imp87_empty_shell_honesty.py -vpytest tests/test_phase_z2_imp30_first_render.py -vpytest tests/test_pipeline_smoke_imp85.py::test_non_vp_smoke_runs_clean -qpytest tests/test_phase_z2_pipeline_imp48.py tests/test_phase_z2_max_rank_regression.py tests/test_dynamic_max_rank.py -qgit diff -- src/phase_z2_pipeline.py tests/test_phase_z2_imp30_first_render.pyGet-Content -LiteralPath 'tests\test_pipeline_smoke_imp85.py' | Select-Object -Skip 60 -First 35Files checked:
src/phase_z2_pipeline.pytests/test_phase_z2_imp30_first_render.pytests/test_phase_z2_imp87_empty_shell_honesty.pytests/test_pipeline_smoke_imp85.pyTest results:
tests/test_phase_z2_imp87_empty_shell_honesty.py: 4 passed.tests/test_phase_z2_imp30_first_render.py: 29 passed.tests/test_phase_z2_pipeline_imp48.py tests/test_phase_z2_max_rank_regression.py tests/test_dynamic_max_rank.py: 36 passed.tests/test_pipeline_smoke_imp85.py::test_non_vp_smoke_runs_clean -q: 1 failed, 1 passed. The failing parametrization is05.mdx-mdx05;03.mdx-mdx03passed.Verified facts:
src/phase_z2_pipeline.pycontains_is_empty_shell_unit, content-vs-empty-shell coverage splitting,EMPTY_SHELL_NO_CONTENTbefore the legacy overall ladder, additive empty/content fields, fail styling forEMPTY_SHELL_NO_CONTENT, and CLI exit 1 for that status.tests/test_phase_z2_imp30_first_render.pyinverts the two planned stale IMP-30 assertions to expectfull_mdx_coverage is Falseandoverall == "EMPTY_SHELL_NO_CONTENT".tests/test_phase_z2_imp87_empty_shell_honesty.pycovers Case B empty-shell-only, Case B visual-fail ordering, Case C mixed empty/content, and normal content-only PASS behavior.tests/test_pipeline_smoke_imp85.py:79still assertscp.returncode == 0for both03.mdxand05.mdx;05.mdxnow exits 1 withEMPTY_SHELL_NO_CONTENT, which is the intended IMP-87 honesty behavior.src/phase_z2_pipeline.py,tests/test_phase_z2_imp30_first_render.py, and the new untrackedtests/test_phase_z2_imp87_empty_shell_honesty.py.rewind_target: stage_2_plan
FINAL_CONSENSUS: NO
[Claude #1] Stage 2 simulation-plan — IMP-87
Round #1 after Stage 4 rewind (Codex #1 verify FAIL — stale IMP-85 smoke parametrize). Stage 1 anchor SHA:
c53722ad. Scope = honesty defect (Case B: zero V4 evidence → empty_shell_dominant → must not PASS). NOT rank-1 reject promotion.Scoped-axis coverage (Stage 1 scope_lock)
test_phase_z2_imp87_empty_shell_honesty.py)test_non_vp_smoke_runs_clean[05.mdx-mdx05]returncode==0 stale=== IMPLEMENTATION_UNITS ===
Per-unit rationale
frame_template_id=="__empty__"ORlabel=="empty_shell"ORmerge_type=="empty_shell"). Legacycoveredkept for IMP-05/IMP-06 readers.empty_shell_dominantprecedes 4-way ladder so visual_check verdict cannot inherit as headline. Additive field shape mirrorsprovisional_first_render_*— no schema reshape.tests/test_pipeline_smoke_imp85.py:64-83parametrize had mdx05 + assertscp.returncode==0. u3 makes mdx05 exit 1. Drop mdx05 row, add separate test asserting (a)returncode==1, (b)EMPTY_SHELL_NO_CONTENTin stderr (u3 token), (c)PAYLOAD_BUILDERS has no such entryNOT in output (IMP-85 axis preserved).Out of scope
no_v4_candidate(feedback_ai_isolation_contract).final_status.htmltemplate restructure (only_ov_classswitch in u3).Rollback
Revert 3 src/tests files + delete
test_phase_z2_imp87_empty_shell_honesty.py. No catalog/contract/runtime data changes.=== EVIDENCE ===
c53722adtests/test_pipeline_smoke_imp85.py:64-83parametrize("05.mdx","mdx05")+ L79assert cp.returncode == 0. u3 makes mdx05 exit 1.src/phase_z2_pipeline.py+119 /tests/test_phase_z2_imp30_first_render.py+25 / newtest_phase_z2_imp87_empty_shell_honesty.py228 (Case A/B/C scope-qualified).FINAL_CONSENSUS: YES
[Codex #1] Stage 2 simulation-plan review - IMP-87
=== IMPLEMENTATION_UNITS ===
verdict: content_ok
summary: Step 20 coverage split and empty-shell detection in src/phase_z2_pipeline.py.
files: [src/phase_z2_pipeline.py]
tests: [tests/test_phase_z2_imp87_empty_shell_honesty.py]
estimate_lines: 35
verdict: content_ok
summary: EMPTY_SHELL_NO_CONTENT enum plus additive empty/content accounting fields in compute_slide_status.
files: [src/phase_z2_pipeline.py]
tests: [tests/test_phase_z2_imp87_empty_shell_honesty.py]
estimate_lines: 45
verdict: content_ok
summary: CLI BLOCKED exit and final_status fail styling for EMPTY_SHELL_NO_CONTENT.
files: [src/phase_z2_pipeline.py]
tests: [tests/test_phase_z2_imp87_empty_shell_honesty.py]
estimate_lines: 40
verdict: content_ok
summary: Invert the two stale IMP-30 empty-shell PASS/full_mdx_coverage assertions.
files: [tests/test_phase_z2_imp30_first_render.py]
tests: [tests/test_phase_z2_imp30_first_render.py]
estimate_lines: 14
verdict: required_and_content_ok
summary: Rewind-required IMP-85 smoke update: remove mdx05 from the exit-0 non_vp parametrization and add a dedicated mdx05 EMPTY_SHELL_NO_CONTENT blocked-exit test while preserving the IMP-85 crash-marker guard.
files: [tests/test_pipeline_smoke_imp85.py]
tests: [tests/test_pipeline_smoke_imp85.py, pytest -q tests]
estimate_lines: 30
Per-unit rationale:
Out of scope:
Rollback strategy:
=== EVIDENCE ===
Commands run:
Files checked:
Test results:
Verified facts:
rewind_target: stage_2_plan
FINAL_CONSENSUS: NO
[Claude #1] Stage 2 simulation-plan — IMP-87
=== IMPLEMENTATION_UNITS ===
summary: _is_empty_shell_unit() helper + split covered/content_covered in compute_slide_status (filtered_section_ids restricted to content units)
files: [src/phase_z2_pipeline.py]
tests: [tests/test_phase_z2_imp87_empty_shell_honesty.py]
estimate_lines: 35
status: impl_on_disk_verify_only
summary: EMPTY_SHELL_NO_CONTENT overall enum (precedes 4-way ladder) + additive surface fields (empty_shell_unit_count, empty_shell_section_ids, content_unit_count, content_rendered_section_ids, empty_shell_dominant)
files: [src/phase_z2_pipeline.py]
tests: [tests/test_phase_z2_imp87_empty_shell_honesty.py]
estimate_lines: 45
status: impl_on_disk_verify_only
summary: CLI exit 1 BLOCKED branch (precedes partial-coverage return) + final_status.html red _ov_class switch for EMPTY_SHELL_NO_CONTENT
files: [src/phase_z2_pipeline.py]
tests: [tests/test_phase_z2_imp87_empty_shell_honesty.py]
estimate_lines: 40
status: impl_on_disk_verify_only
summary: Invert IMP-30 u6+u7 stale assertions (PASS/full_mdx_coverage=True → EMPTY_SHELL_NO_CONTENT/False) per Stage 1 Q3 downstream audit
files: [tests/test_phase_z2_imp30_first_render.py]
tests: [tests/test_phase_z2_imp30_first_render.py]
estimate_lines: 14
status: impl_on_disk_verify_only
summary: Drop mdx05 row from test_non_vp_smoke_runs_clean parametrize + add test_mdx05_empty_shell_blocked_exit asserting returncode==1, EMPTY_SHELL_NO_CONTENT in stderr, IMP-85 crash marker absent. mdx04 + conftest tests untouched.
files: [tests/test_pipeline_smoke_imp85.py]
tests: [tests/test_pipeline_smoke_imp85.py]
estimate_lines: 30
status: new
Per-unit rationale:
coveredkept for IMP-05/IMP-06 readers.empty_shell_dominantprecedes the 4-way ladder so visual_check PASS cannot inherit as headline. Additive shape mirrorsprovisional_first_render_*— no schema reshape.Out of scope:
Rollback strategy:
Revert src/phase_z2_pipeline.py + tests/test_phase_z2_imp30_first_render.py + tests/test_pipeline_smoke_imp85.py edits, delete tests/test_phase_z2_imp87_empty_shell_honesty.py.
=== EVIDENCE ===
Stage 1 anchor SHA:
c53722ad0bCodex #1 r1 review (FINAL_CONSENSUS: NO, rewind_target: stage_2_plan):
verdict u1-u4 = content_ok; u5 = required_and_content_ok
format violation only: prose/table before === IMPLEMENTATION_UNITS ===, markdown heading on YAML, fenced code block
Round #1 corrections:
first-line agent header preserved (P5b 2026-05-20)
no prose/table/heading before === IMPLEMENTATION_UNITS ===
plain YAML (no triple-backtick fence)
per-unit rationale ≤ 3 lines each, no inline code
Atomicity check:
u1 35L/1f, u2 45L/1f, u3 40L/1f, u4 14L/1f, u5 30L/1f — all ≤ 50L AND ≤ 3 files
Stage 1 scope coverage (7 items):
step20 honesty (empty_shell ≠ PASS) → u1+u2
split rendered-with-content vs rendered-empty-shell → u1+u2
EMPTY_SHELL_NO_CONTENT BLOCKED exit + html red → u3
preserve chain_exhausted (IMP-30 u1+u3) → u4 inverts only the two stale assertions, lookup logic untouched
Stage 1 Q3 downstream consumer audit → u4
Case A/B/C regression guard → new tests/test_phase_z2_imp87_empty_shell_honesty.py
rewind gap (test_non_vp_smoke_runs_clean[05.mdx-mdx05] returncode==0 stale) → u5
Codex #1 pytest baseline: 1 failed (test_non_vp_smoke_runs_clean[05.mdx-mdx05]), 1069 passed — u5 closes the gap.
FINAL_CONSENSUS: YES
[Codex #1] Stage 2 simulation-plan review - IMP-87
=== IMPLEMENTATION_UNITS ===
summary: Add empty-shell detection and split legacy covered_section_ids from content-rendered coverage so empty placeholders cannot count as full MDX content.
files: [src/phase_z2_pipeline.py]
tests: [tests/test_phase_z2_imp87_empty_shell_honesty.py]
estimate_lines: 35
summary: Add EMPTY_SHELL_NO_CONTENT overall classification and additive empty/content accounting fields before the legacy status ladder.
files: [src/phase_z2_pipeline.py]
tests: [tests/test_phase_z2_imp87_empty_shell_honesty.py]
estimate_lines: 45
summary: Route EMPTY_SHELL_NO_CONTENT to blocked CLI exit and fail styling in step20 final_status.html.
files: [src/phase_z2_pipeline.py]
tests: [tests/test_phase_z2_imp87_empty_shell_honesty.py]
estimate_lines: 40
summary: Invert stale IMP-30 empty-shell PASS/full_mdx_coverage assertions to EMPTY_SHELL_NO_CONTENT/full_mdx_coverage False.
files: [tests/test_phase_z2_imp30_first_render.py]
tests: [tests/test_phase_z2_imp30_first_render.py]
estimate_lines: 14
summary: Update IMP-85 smoke coverage by removing mdx05 from the exit-0 non-VP parametrization and adding a dedicated mdx05 blocked-exit assertion while preserving the IMP-85 crash-marker guard.
files: [tests/test_pipeline_smoke_imp85.py]
tests: [tests/test_pipeline_smoke_imp85.py]
estimate_lines: 30
Per-unit rationale:
Out of scope:
Rollback strategy:
=== EVIDENCE ===
Commands run:
Files checked:
Test results:
Verified facts:
FINAL_CONSENSUS: YES
[Claude #1] Stage 3 code-edit — IMP-87 u1
Executing unit: u1
unit_executed
u1 — Add empty-shell detection and split legacy
covered_section_idsfrom content-rendered coverage so empty placeholders cannot count as full MDX content. Stage 2 plan estimate 35 lines; actualsrc/phase_z2_pipeline.pyinsertion = 38 lines (helper 24 + coverage split 14).files_changed
src/phase_z2_pipeline.py(modified —_is_empty_shell_unithelper added beforecompute_slide_status; coverage axis split insidecompute_slide_status)tests/test_phase_z2_imp87_empty_shell_honesty.py(new file — u1-scoped baseline, 196 lines including module docstring + 3-marker helper tests + Case A/B/C surface checks)diff_summary
src/phase_z2_pipeline.py— two contiguous hunks, no surrounding code touched:New helper
_is_empty_shell_unit(u: CompositionUnit) -> boolinserted in theDebug.jsonsection header block (right abovecompute_slide_status). Three independent OR-branches:frame_template_id == "__empty__"label == "empty_shell"merge_type == "empty_shell"Each marker can flag a unit independently so IMP-30 u4 phase A / phase B / e2e synthesised placeholders are classified consistently.
getattr(..., None)defaults keep legacy callers safe.compute_slide_statuscoverage block redefined:covered: setretained for downstream display (covered_section_idsfield, IMP-05 / IMP-06 readers).content_covered: setaccumulates only units where_is_empty_shell_unit(u)is False.filtered_idsnow derives fromcontent_covered, notcovered.full_coveragefollows newfiltered_ids→ an EMPTY-SHELL-only slide reportsfull_mdx_coverage=Falseinstead of inheriting True from the placeholder cover.tests/test_phase_z2_imp87_empty_shell_honesty.py— u1 baseline (scope-qualified):covered_section_idslegacy,filtered_section_idshonesty axis,full_mdx_coverageboolean) and which are deferred to u2 (overallenum, additive counters) / u3 (CLI BLOCKED exit, HTML class).full_mdx_coverage=True,filtered_section_ids=[],covered_section_ids=[S1, S2].full_mdx_coverage=False, every aligned section infiltered_section_ids, legacycovered_section_idsstill lists shell-covered sections (display preserved).full_mdx_coverage=False, only shell-attached section infiltered_section_ids, legacycovered_section_idsincludes both.verification (scope-qualified)
IMP-30 first-render regression run on the same checkout:
Both failing assertions are exactly the
full_mdx_coverage is True/overall == "PASS"lines that Stage 2 baseline_tests flagged: "current PASS on stale empty-shell assertions; will flip post-u4". The failures are intentional pre-u4 evidence of the behavioral correction — u4 inverts the two assertions. No regression on the other 27 IMP-30 tests (provisional qualifier surface unchanged, normal content path unchanged, all rendered=True invariants preserved).scope-lock evidence
u1 does NOT touch:
PASS/RENDERED_WITH_VISUAL_REGRESSION/PARTIAL_COVERAGE/PARTIAL_COVERAGE_WITH_VISUAL_REGRESSION) — u2 scope.final_status.html_ov_class— u3 scope.tests/test_pipeline_smoke_imp85.py— u5 scope.grep -nE "EMPTY_SHELL_NO_CONTENT|empty_shell_unit_count|empty_shell_section_ids|content_unit_count|content_rendered_section_ids|empty_shell_dominant" src/phase_z2_pipeline.py= zero matches (confirms no u2 surface leaked into u1).remaining_units
src/phase_z2_pipeline.py; tests extendtest_phase_z2_imp87_empty_shell_honesty.py)final_status.html(src/phase_z2_pipeline.py; tests extendtest_phase_z2_imp87_empty_shell_honesty.py)tests/test_phase_z2_imp30_first_render.py)tests/test_pipeline_smoke_imp85.py)follow_up_issue_candidates
None — u1 stayed inside scope-lock. Two IMP-30 test failures are documented in Stage 2 baseline_tests as expected pre-u4 state, not scope-creep.
guardrails honored
feedback_artifact_status_naming— coverage axis now reflects actual content state.feedback_scope_qualified_verification— each test names its case (helper marker / A / B / C) and the docstring lists what u1 does and does not assert.feedback_no_hardcoding— three-marker OR; no mdx-id branch, no template-id allowlist.feedback_factual_verification— every claim above has a pytest line / grep evidence path.feedback_one_step_per_turn— exactly one unit edited; u2 / u3 / u4 / u5 left for subsequent rounds.STOP — no commit, no push. Deliverable is this comment.
[Codex #1] Stage 3 code-edit review - IMP-87 u1
Verified only the executed unit u1 against the binding Stage 2 contract. The implementation is correct for u1:
_is_empty_shell_unituses the required three-marker OR (frame_template_id == "__empty__",label == "empty_shell",merge_type == "empty_shell"), andcompute_slide_statusnow splits legacycovered_section_idsfrom the content-rendered coverage axis that drivesfiltered_section_idsandfull_mdx_coverage.No u2/u3 behavior was introduced in this unit:
overallstill uses the existing ladder, and no CLI exit or final_status.html styling changes were made. The new test file is u1-scoped and covers Case A content-only, Case B empty-shell-only, and Case C mixed content/empty-shell behavior.=== EVIDENCE ===
Commands run:
git diff -- src/phase_z2_pipeline.py tests/test_phase_z2_imp87_empty_shell_honesty.pygit status --shortrg -n "def compute_slide_status|_is_empty_shell_unit|covered_section_ids|filtered_section_ids|full_mdx_coverage|EMPTY_SHELL" src/phase_z2_pipeline.py tests/test_phase_z2_imp87_empty_shell_honesty.pypytest -q tests/test_phase_z2_imp87_empty_shell_honesty.pygit diff --check -- src/phase_z2_pipeline.py tests/test_phase_z2_imp87_empty_shell_honesty.py(Get-Content -LiteralPath 'tests\test_phase_z2_imp87_empty_shell_honesty.py').Countgit diff --stat -- src/phase_z2_pipeline.py tests/test_phase_z2_imp87_empty_shell_honesty.pyFiles checked:
D:\ad-hoc\kei\design_agent\src\phase_z2_pipeline.pyD:\ad-hoc\kei\design_agent\tests\test_phase_z2_imp87_empty_shell_honesty.pyTest results:
pytest -q tests/test_phase_z2_imp87_empty_shell_honesty.py-> 7 passed in 0.06sgit diff --check -- src/phase_z2_pipeline.py tests/test_phase_z2_imp87_empty_shell_honesty.py-> no whitespace errorsVerified facts:
covered_section_idspopulated while producingfull_mdx_coverage=Falseand all aligned sections infiltered_section_ids.full_mdx_coverage=True,filtered_section_ids=[], and legacy covered sections intact.Remaining units: [u2, u3, u4, u5]
rewind_target: continue_same
FINAL_CONSENSUS: NO
[Claude #2] Stage 3 code-edit — IMP-87 u2
Executing unit: u2
unit_executed
u2 — Add EMPTY_SHELL_NO_CONTENT overall classification and additive empty/content accounting fields before the legacy status ladder. Stage 2 plan estimate 45 lines; actual
src/phase_z2_pipeline.pyinsertion = 47 lines across four contiguous hunks (docstring enum entry 7 + accounting collection 18 + precedence check 17 + return-dict additive fields 5). Test file extended with 5 new u2-scoped cases.files_changed
src/phase_z2_pipeline.py(modified —compute_slide_statusonly: docstringoverall enumblock gains the EMPTY_SHELL_NO_CONTENT entry; new accounting collection runs immediately after the existing u1 coverage split; precedenceif is_empty_shell_only:inserted ahead of the existing 4-way ladder; return dict gains four additive accounting fields)tests/test_phase_z2_imp87_empty_shell_honesty.py(modified — module docstring widened to u1+u2 scope; 5 new u2-scoped tests appended below the existing u1 baseline; helper / Case A / B / C u1 assertions untouched)diff_summary
src/phase_z2_pipeline.py— four contiguous hunks insidecompute_slide_status, no surrounding code touched:Docstring
overall enum :block gains a leading entry abovePASS:EMPTY_SHELL_NO_CONTENT — IMP-87 u2 : every selected unit is an IMP-30 u4 empty-shell placeholder (no content-rendered units). Takes precedence over the 4-way ladder below because such a slide can technically pass Selenium overflow checks but carries no MDX content.Accounting collection (immediately after u1 coverage split, before
adapter_needed_units = list(...)):empty_shell_units_list = [u for u in units if _is_empty_shell_unit(u)]content_units_list = [u for u in units if not _is_empty_shell_unit(u)]empty_shell_section_ids = sorted({sid for u in empty_shell_units_list for sid in (u.source_section_ids or [])})content_rendered_section_ids = sorted(content_covered)(u.source_section_ids or [])defensive default mirrors the legacygetattr(..., None)pattern in the surrounding code so legacy units cannot crash this block.Precedence check inserted ahead of the existing
if full_coverage and visual_passed:head; existing 4-way ladder demoted toelifchain (order preserved):is_empty_shell_only = (len(aligned_ids) > 0 and len(empty_shell_units_list) > 0 and len(content_units_list) == 0)if is_empty_shell_only: overall = "EMPTY_SHELL_NO_CONTENT"Return dict gains four additive fields grouped immediately after
"covered_section_ids":"content_rendered_section_ids": content_rendered_section_ids"content_rendered_unit_count": len(content_units_list)"empty_shell_section_ids": empty_shell_section_ids"empty_shell_unit_count": len(empty_shell_units_list)covered_section_ids/filtered_section_ids/full_mdx_coverageretain their u1 semantics. No field renamed, removed, or reshaped.tests/test_phase_z2_imp87_empty_shell_honesty.py— 5 new u2-scoped tests appended; u1 baseline untouched:test_case_b_u2_overall_is_empty_shell_no_content_even_when_visual_passes— Case B (mdx05 Stage 1 reproduction) withoverflow_pass={passed: True}. Assertsoverall == "EMPTY_SHELL_NO_CONTENT". The precedence MUST fire before the ladder pushes to PASS.test_case_b_u2_overall_remains_empty_shell_no_content_when_visual_fails— Case B withoverflow_fail. Precedence is unconditional onvisual_passed; the empty-shell signal dominates regardless of overflow status because there is no MDX content to rescue.test_case_b_u2_accounting_fields_populate_for_empty_shell_only— Case B accounting.empty_shell_section_ids == ["S1","S2","S3"],content_rendered_section_ids == [],empty_shell_unit_count == 1,content_rendered_unit_count == 0.test_case_a_u2_overall_pass_preserved_for_content_only— Case A regression guard. Zero empty-shell units → precedence MUST NOT fire →overall == "PASS". Accounting fields: all sections incontent_rendered_section_ids, none inempty_shell_section_ids, counts 2/0.test_case_c_u2_mixed_falls_to_legacy_partial_coverage_ladder— Case C precedence non-fire. One content unit + one shell unit →overall == "PARTIAL_COVERAGE"(legacy ladder, not-full-coverage + visual-pass). Accounting partitions cleanly (S1 content, S2 shell, counts 1/1).verification (scope-qualified)
7 u1 baseline tests + 5 new u2 tests — all pass. Helper marker tests (3 OR-branches + 1 negative) unchanged; Case A/B/C u1 surface assertions unchanged.
IMP-30 first-render regression run on the same checkout (scope = pre-u4 baseline drift only):
Identical failure set as the Claude #1 u1 baseline — both stem from the u1
full_mdx_coveragehonesty change and are pre-flagged in Stage 2baseline_tests: "current PASS on stale empty-shell assertions; will flip post-u4". u2 introduces zero new IMP-30 regressions — same 27 passing, same 2 failing assertions on the same lines. u4 will invert those two assertions; u2 stays out of test file scope per Stage 2 plan.scope-lock evidence
u2 does NOT touch:
_is_empty_shell_unit(still pure 3-marker OR) — already locked.tests/test_phase_z2_imp30_first_render.py— u4 scope.tests/test_pipeline_smoke_imp85.py— u5 scope.sys.exit(...)call sites insrc/phase_z2_pipeline.py(lines 1960, 6142, 6267, 6275, …) all unchanged.final_status.html_ov_classswitch (line 6017:pass | fail | partial) — u3 scope.grep -nE "BLOCKED|EMPTY_SHELL_NO_CONTENT.*ov_class|EMPTY_SHELL_NO_CONTENT.*exit" src/phase_z2_pipeline.py— only the docstring enum entry and the precedence assignment line match; zero CLI / HTML branches were added. u2 stays insidecompute_slide_statusreturn surface.remaining_units
final_status.html(src/phase_z2_pipeline.py; tests extendtests/test_phase_z2_imp87_empty_shell_honesty.py)tests/test_phase_z2_imp30_first_render.py)tests/test_pipeline_smoke_imp85.py)follow_up_issue_candidates
None — u2 stayed inside Stage 2 scope-lock (axis A2 accounting split surfacing + axis A3 overall enum precedence). The two inherited IMP-30 failures remain the documented pre-u4 state; no new axis surfaced during u2 implementation.
guardrails honored
feedback_artifact_status_naming—overallenum now distinguishes "rendered with content" from "rendered empty-shell placeholder"; honesty axis preserved.feedback_scope_qualified_verification— every new test names its case (B / A / C) and labels its u1 vs u2 surface in the module docstring.feedback_no_hardcoding— precedence derives from marker-based partition counts; no mdx-id branch, no template-id allowlist, no exit-code constant introduced.feedback_factual_verification— every claim has a pytest output / line-number / grep evidence path.feedback_one_step_per_turn— exactly one unit edited; u3 / u4 / u5 left for subsequent rounds.STOP — no commit, no push. Deliverable is this comment.
[Codex #2] Stage 3 code-edit review - IMP-87 u2
Verified only the executed unit u2 against the binding Stage 2 contract. The implementation is correct for u2:
compute_slide_statusadds theEMPTY_SHELL_NO_CONTENTenum documentation, partitions selected units into empty-shell vs content-rendered accounting fields, and applies the new empty-shell-only precedence before the legacy 4-way status ladder.The change remains scoped to u2. It does not implement u3 early: Step 20
final_status.htmlstill uses the existing_ov_classexpression, and the CLI exit path still branches on visual failure andfull_mdx_coverage, not onEMPTY_SHELL_NO_CONTENTdirectly. The new/extended regression tests cover Case B empty-shell-only precedence for both visual pass and visual fail, plus additive accounting for Case A content-only, Case B empty-shell-only, and Case C mixed content/shell.=== EVIDENCE ===
Commands run:
git diff -- src/phase_z2_pipeline.py tests/test_phase_z2_imp87_empty_shell_honesty.pygit status --shortrg -n "EMPTY_SHELL_NO_CONTENT|empty_shell_units_list|content_units_list|empty_shell_section_ids|content_rendered_section_ids|covered_section_ids|overall =|def compute_slide_status|_is_empty_shell_unit" src/phase_z2_pipeline.py tests/test_phase_z2_imp87_empty_shell_honesty.pypytest -q tests/test_phase_z2_imp87_empty_shell_honesty.pygit diff --check -- src/phase_z2_pipeline.py tests/test_phase_z2_imp87_empty_shell_honesty.pyGet-Content -LiteralPath 'tests\test_phase_z2_imp87_empty_shell_honesty.py' -TotalCount 380Get-Content -LiteralPath 'src\phase_z2_pipeline.py' | Select-Object -Skip 5980 -First 150git diff --stat -- src/phase_z2_pipeline.py tests/test_phase_z2_imp87_empty_shell_honesty.pyrg -n "EMPTY_SHELL_NO_CONTENT|returncode|sys.exit|final_status|_ov_class|overall ==|overall = slide_status" src/phase_z2_pipeline.pyGet-Content -LiteralPath 'src\phase_z2_pipeline.py' | Select-Object -Skip 6128 -First 26Files checked:
D:\ad-hoc\kei\design_agent\src\phase_z2_pipeline.pyD:\ad-hoc\kei\design_agent\tests\test_phase_z2_imp87_empty_shell_honesty.pyTest results:
pytest -q tests/test_phase_z2_imp87_empty_shell_honesty.py-> 12 passed in 0.07sgit diff --check -- src/phase_z2_pipeline.py tests/test_phase_z2_imp87_empty_shell_honesty.py-> no whitespace errorsVerified facts:
EMPTY_SHELL_NO_CONTENTis documented in thecompute_slide_statusoverall enum block.empty_shell_units_listandcontent_units_listare derived with the u1 three-marker helper, with additive section/count fields returned ascontent_rendered_section_ids,content_rendered_unit_count,empty_shell_section_ids, andempty_shell_unit_count.overalltoEMPTY_SHELL_NO_CONTENTbefore the legacy PASS / visual regression / partial coverage ladder.overall == "PASS"and content accounting for both sections.PARTIAL_COVERAGEladder and splits section accounting correctly.Remaining units: [u3, u4, u5]
rewind_target: continue_same
FINAL_CONSENSUS: NO
[Claude #3] Stage 3 code-edit — IMP-87 u3
Executing unit: u3
unit_executed
u3 — Route EMPTY_SHELL_NO_CONTENT to a BLOCKED CLI exit (returncode 1) and a red
"fail"CSS class on the step20 final_status.html dashboard. Stage 2 plan estimate 40 lines; actualsrc/phase_z2_pipeline.pyinsertion = ~72 lines (helper_final_status_html_class24 + helper_is_blocked_overall12 + 2 blank separators +_ov_classcall-site rewrite +4 net + BLOCKED exit block 28). The overshoot is docstring + scope-lock comments at every insertion point (feedback_factual_verification/ Stage 2 axis A4/A5 anchors); no extra behaviour beyond the two helpers + their two call sites. Test file extended with 8 new u3-scoped cases (5 for_final_status_html_class, 3 for_is_blocked_overall).files_changed
src/phase_z2_pipeline.py(modified — two new helpers_final_status_html_classand_is_blocked_overallinserted immediately after the u1_is_empty_shell_unithelper; the existing_ov_class = "pass" if ...expression at the step20 HTML site replaced by_ov_class = _final_status_html_class(_overall); a new BLOCKEDsys.exit(1)branch added immediately ahead of the existingif not slide_status["visual_check_passed"]:block in the exit-policy section)tests/test_phase_z2_imp87_empty_shell_honesty.py(modified — module docstring widened to u1+u2+u3 scope; 2 new imports (_final_status_html_class,_is_blocked_overall) added to the existingfrom src.phase_z2_pipeline import …block; 8 new u3-scoped tests appended below the existing u1+u2 baseline; helper / Case A / B / C u1+u2 assertions untouched)diff_summary
src/phase_z2_pipeline.py— three contiguous hunks, no surrounding code touched:Two new helpers inserted immediately after
_is_empty_shell_unit, beforecompute_slide_status:_final_status_html_class(overall: str) -> str— maps theoverallenum string to one of"pass"/"fail"/"partial"for the step20 final_status.html dashboard. The explicitEMPTY_SHELL_NO_CONTENT → "fail"branch runs BEFORE the legacy substring-based mapping (preserved verbatim in the fallback chain) because the literal"EMPTY_SHELL_NO_CONTENT"contains neither"PASS"nor"FAIL"/"REGRESSION"and would otherwise default to"partial"(Stage 2 axis A5)._is_blocked_overall(overall: str) -> bool— single-enum allow-list returning True only for"EMPTY_SHELL_NO_CONTENT". Used by the CLI to gate the newsys.exit(1)branch BEFORE the legacyvisual_fail/partial_coveragebranches (Stage 2 axis A4)._ov_classcall site rewritten in place. Previously:Now:
with a 3-line scope-lock comment immediately above the call. No other surrounding HTML-write code touched.
New BLOCKED exit branch inserted immediately ahead of the existing
if not slide_status["visual_check_passed"]:block in the exit-policy section. The branch:_is_blocked_overall(overall).aligned_section_ids,empty_shell_section_ids,empty_shell_unit_count,content_rendered_unit_countdirectly from the u2 additive surface fields (no new state introduced).[Phase Z-2 IMP-87 u3] BLOCKED @ empty_shell_no_content (...)header + reason + aligned ids + shell ids + unit-count summary).sys.exit(1).write_overflow_errorbecause EMPTY_SHELL_NO_CONTENT is a content-presence failure, not an overflow failure (the existingerror.jsonschema is overflow-shaped; reusing it would be a category error). The structured signal already lives instep20_slide_status.jsonvia the u2 additive fields.The existing
visual_fail→sys.exit(1)block, thepartial_coverage→ return-without-exit block, and the final PASS print + return are unchanged.tests/test_phase_z2_imp87_empty_shell_honesty.py— 8 new u3-scoped tests below the u2 baseline:CSS-class mapping (5):
test_u3_html_class_empty_shell_no_content_is_fail_red— Case B →"fail"(the Stage 2 axis A5 honesty-defect correction).test_u3_html_class_pass_preserves_legacy_substring_mapping— Case A →"pass"(regression guard against the EMPTY_SHELL branch over-firing).test_u3_html_class_regression_and_fail_substrings_preserved—RENDERED_WITH_VISUAL_REGRESSIONandPARTIAL_COVERAGE_WITH_VISUAL_REGRESSIONstill resolve to"fail"via the preserved legacy substring rule.test_u3_html_class_partial_coverage_remains_amber_partial—PARTIAL_COVERAGEstays in the"partial"band.test_u3_html_class_unknown_enum_falls_back_to_partial— defensive default ("?"and anUNKNOWN_FUTURE_ENUMsentinel both resolve to"partial").BLOCKED CLI exit gating (3):
test_u3_blocked_overall_true_for_empty_shell_no_content— Case B helper True.test_u3_blocked_overall_false_for_pass_and_legacy_failure_enums— iteratesPASS,RENDERED_WITH_VISUAL_REGRESSION,PARTIAL_COVERAGE,PARTIAL_COVERAGE_WITH_VISUAL_REGRESSION; all MUST stay False so the legacy CLI ladder (visual_fail → exit 1, partial_coverage → return, PASS → return) is preserved bit-for-bit.test_u3_blocked_overall_false_for_defensive_unknown_enum— defensive default;"?"andUNKNOWN_FUTURE_ENUMboth stay False (narrow allow-list lock — future enum additions must opt-in explicitly).End-to-end pipeline integration (full run → BLOCKED returncode in actual artifact) is intentionally deferred to u5 (the IMP-85 smoke parametrization update that already plans a dedicated mdx05 blocked-exit assertion). The u3 helper-level guarantee is enough at this layer because both new call sites are direct substitutions of inline expressions — there is no orchestration logic added at the call sites that could diverge from helper return values.
test results
python -m pytest -q tests/test_phase_z2_imp87_empty_shell_honesty.py→ 20 passed in 0.15s (12 u1+u2 baseline + 8 new u3 tests).
git diff --check -- src/phase_z2_pipeline.py tests/test_phase_z2_imp87_empty_shell_honesty.py→ no whitespace errors.
scope_lock evidence
feedback_artifact_status_naming: u3 surfaces the Case B defect on TWO independent honesty axes — the CLI returncode (BLOCKED) and the HTML colour band (red"fail"). No new pseudo-PASS path introduced.feedback_no_hardcoding: both helpers gate on the enum string, not on mdx-id / sample identity. The single enumEMPTY_SHELL_NO_CONTENTis the locked allow-list anchor (added in u2, owned bycompute_slide_status).feedback_ai_isolation_contract: no AI invoked, no AI route activated for the BLOCKED branch (Stage 1 lock — mdx05 Case B has zero V4 evidence so rank-1 reject promotion is structurally inapplicable; u3 is honesty-only).feedback_scope_qualified_verification: every new test names which case (A / B / C) it covers and which surface (u3 helper level) it asserts. Cross-unit deferrals (u5 end-to-end) noted in module docstring.feedback_one_step_per_turn: u3 only. u4 (test-assertion inversion) and u5 (smoke parametrize) NOT touched in this turn.remaining_units
[u4, u5]
follow_up_issue_candidates
None — all u3 changes stay within the binding Stage 2 contract (CLI exit + HTML styling for the existing
EMPTY_SHELL_NO_CONTENTenum). No scope-lock 외 axis surfaced during implementation.[Codex #3] Stage 3 code-edit review - IMP-87 u3
Verified only the executed unit u3 against the binding Stage 2 contract. The implementation is correct for u3:
EMPTY_SHELL_NO_CONTENTis routed through a narrow blocked-overall helper, Step 20final_status.htmlnow maps that enum to the red"fail"class, and the CLI exit policy checks the blocked enum before the legacy visual-fail and partial-coverage branches.The change remains scoped to u3. It does not implement u4/u5 early: no IMP-30 assertion inversion was made in
tests/test_phase_z2_imp30_first_render.py, and no IMP-85 smoke parametrization changes were made intests/test_pipeline_smoke_imp85.py. The new tests are helper-level u3 tests for HTML class mapping and blocked-exit gating; full mdx05 blocked-exit smoke coverage remains in u5.=== EVIDENCE ===
Commands run:
git diff -- src/phase_z2_pipeline.py tests/test_phase_z2_imp87_empty_shell_honesty.pygit status --shortrg -n "EMPTY_SHELL_NO_CONTENT|_final_status_html_class|_is_blocked_overall|final_status|_ov_class|visual_check_passed|sys.exit\\(1\\)|content_rendered_section_ids|empty_shell_section_ids" src/phase_z2_pipeline.py tests/test_phase_z2_imp87_empty_shell_honesty.pyGet-Content -LiteralPath 'src\\phase_z2_pipeline.py' | Select-Object -Skip 2938 -First 95Get-Content -LiteralPath 'src\\phase_z2_pipeline.py' | Select-Object -Skip 6038 -First 190Get-Content -LiteralPath 'tests\\test_phase_z2_imp87_empty_shell_honesty.py' | Select-Object -Skip 350 -First 125pytest -q tests/test_phase_z2_imp87_empty_shell_honesty.pygit diff --check -- src/phase_z2_pipeline.py tests/test_phase_z2_imp87_empty_shell_honesty.pygit diff --stat -- src/phase_z2_pipeline.py tests/test_phase_z2_imp87_empty_shell_honesty.pyFiles checked:
D:\ad-hoc\kei\design_agent\src\phase_z2_pipeline.pyD:\ad-hoc\kei\design_agent\tests\test_phase_z2_imp87_empty_shell_honesty.pyTest results:
pytest -q tests/test_phase_z2_imp87_empty_shell_honesty.py-> 20 passed in 0.07sgit diff --check -- src/phase_z2_pipeline.py tests/test_phase_z2_imp87_empty_shell_honesty.py-> no whitespace errorsVerified facts:
_final_status_html_class("EMPTY_SHELL_NO_CONTENT")returns"fail"before the legacy substring mapping, preserving"PASS"as"pass", visual-regression/fail substrings as"fail", and partial/unknown enums as"partial"._is_blocked_overallreturns true only for"EMPTY_SHELL_NO_CONTENT"and false for the existing PASS, visual-regression, partial-coverage, and unknown enum strings.final_status.htmluses_final_status_html_class(_overall)at the_ov_classcall site._is_blocked_overall(overall)beforeif not slide_status["visual_check_passed"]:and before the partial-coverage return branch, and exits withsys.exit(1)for the blocked enum.Remaining units: [u4, u5]
rewind_target: continue_same
FINAL_CONSENSUS: NO
[Claude #4] Stage 3 code-edit — IMP-87 u4
Executing unit: u4
unit_executed
u4 — Invert stale IMP-30 empty-shell PASS /
full_mdx_coverage=Trueassertions toEMPTY_SHELL_NO_CONTENT/full_mdx_coverage=False. Stage 2 plan estimate 14 lines; actualtests/test_phase_z2_imp30_first_render.pynet diff =+17 / −9(4 assertion lines flipped + 2 comment blocks rewritten to anchor the inversion to IMP-87 u1+u2 implementation). No production code touched. No new tests added.files_changed
tests/test_phase_z2_imp30_first_render.py(modified — only the two stale empty-shell assertions insidetest_u6_empty_shell_unit_listed_with_empty_identifiers(lines 1018-1019 pre-u4) andtest_u7_e2e_zero_v4_empty_shell_status_surface(lines 1362-1363 pre-u4). Surrounding 27 test functions u3 / u5 / u6-mixed / u7-chain_exhausted / u7-normal / u7-restructure-only untouched.)diff_summary
Two contiguous hunks in
tests/test_phase_z2_imp30_first_render.py, no other files in the working tree changed:Hunk 1 —
test_u6_empty_shell_unit_listed_with_empty_identifiers(u6 case 3 of IMP-30):Old (stale, asserted shell coverage counted as PASS):
New (IMP-87 u4 honesty inversion):
Hunk 2 —
test_u7_e2e_zero_v4_empty_shell_status_surface(u7 case 2 of IMP-30):Old (stale, asserted zero-V4 shell e2e flow reported PASS):
New (IMP-87 u4 honesty inversion):
scope-qualified verification
tests/test_phase_z2_imp30_first_render.pytwo assertion blocks only. No production code, no other test files.test_u6_chain_exhausted_provisional_unit_listed_with_full_shape(provisional but non-shell,label="restructure",phase_z_status="extract_matched_zone") keepsoverall == "PASS";test_u7_imp30_all_restructure_only_each_section_gets_provisional_unit(provisional restructure / reject, never empty_shell) keepsoverall == "PASS";test_u7_e2e_normal_path_unchanged_with_opt_in_flags(normal rank_1) keepsoverall == "PASS". The 3-marker helper_is_empty_shell_unit(u1) only flags units withframe_template_id == "__empty__"ORlabel == "empty_shell"ORmerge_type == "empty_shell", so chain_exhausted / restructure-only / normal units do not trip the new precedence.overall == "PASS"/full_mdx_coverage is True. ripgrep onfull_mdx_coverage|EMPTY_SHELL_NO_CONTENT|overall.*PASSshowed only the two hunks expected to flip."u4 deliberately sets this so coverage stays True"(that "u4" referred to the pre-existing IMP-30 unit u4 for empty-shell synthesis). Post-edit comments anchor the inversion to IMP-87 u1+u2 (content-rendered axis + EMPTY_SHELL_NO_CONTENT precedence), not the old IMP-30 u4 semantics.test evidence (commands run + results)
Both green. The IMP-30 file's 29-test surface (u3 / u4 / u5 / u6 / u7) all pass with the inverted assertions, confirming the IMP-87 u1+u2 implementation correctly drives
full_mdx_coverage=Falseandoverall="EMPTY_SHELL_NO_CONTENT"for empty-shell-only fixtures while leaving non-shell provisional cases (chain_exhausted, restructure-only, normal) on the legacy ladder.scope guard / RULE compliance
_mk_unit/_u7_sectionfixtures, not real mdx05 file paths.compute_slide_statusalready returns the new enum / coverage from u2's implementation.remaining_units
[u5]u5 = Update IMP-85 smoke coverage by removing mdx05 from the exit-0 non-VP parametrization and adding a dedicated mdx05 blocked-exit assertion while preserving the IMP-85 crash-marker guard. Files:
tests/test_pipeline_smoke_imp85.py. Estimate 30 lines.follow_up_issue_candidates
None. Scope-lock holds — no axis discovered outside
feedback_artifact_status_naminghonesty defect. The two stale assertions were authored when IMP-30 u4 deliberately setfull_mdx_coverage=Trueto keep the slide rendering; IMP-87 reverses that semantics at the status layer only (the empty-shell unit still gets synthesized and still feeds the placeholder zone — onlyoverall/full_mdx_coverage/content_rendered_section_idsreflect the no-content reality).[Codex #4] Stage 3 code-edit review - IMP-87 u4
Verified only the executed unit u4 against the binding Stage 2 contract. The implementation is correct for u4: the two stale IMP-30 empty-shell assertions now expect
full_mdx_coverage is Falseandoverall == "EMPTY_SHELL_NO_CONTENT", matching the u1 content-rendered coverage split and u2 empty-shell-only enum precedence.The change remains scoped to u4. It does not implement u5 early:
tests/test_pipeline_smoke_imp85.pyhas no diff and still contains the existing mdx05 exit-0 smoke expectations. No production code, rank-1 reject promotion, Step 12 AI routing, V4 lookup behavior, or chain-exhausted behavior was changed by u4.=== EVIDENCE ===
Commands run:
git status --shortgit diff -- tests/test_phase_z2_imp30_first_render.pyrg -n "EMPTY_SHELL_NO_CONTENT|full_mdx_coverage|covered_section_ids|test_u6_empty_shell_unit_listed_with_empty_identifiers|test_u7_e2e_zero_v4_empty_shell_status_surface" tests/test_phase_z2_imp30_first_render.pypytest -q tests/test_phase_z2_imp30_first_render.pygit diff --check -- tests/test_phase_z2_imp30_first_render.pygit diff --stat -- tests/test_phase_z2_imp30_first_render.pygit diff -- tests/test_pipeline_smoke_imp85.pyrg -n "mdx05|05\.mdx|EMPTY_SHELL_NO_CONTENT|blocked|returncode|exit" tests/test_pipeline_smoke_imp85.pyGet-Content -LiteralPath 'tests\test_phase_z2_imp30_first_render.py' | Select-Object -Skip 990 -First 45Get-Content -LiteralPath 'tests\test_phase_z2_imp30_first_render.py' | Select-Object -Skip 1348 -First 32Files checked:
D:\ad-hoc\kei\design_agent\tests\test_phase_z2_imp30_first_render.pyD:\ad-hoc\kei\design_agent\tests\test_pipeline_smoke_imp85.pyTest results:
pytest -q tests/test_phase_z2_imp30_first_render.py-> 29 passed in 0.11sgit diff --check -- tests/test_phase_z2_imp30_first_render.py-> no whitespace errorsVerified facts:
tests/test_phase_z2_imp30_first_render.pyand changes only the two planned stale empty-shell assertion blocks plus their comments.test_u6_empty_shell_unit_listed_with_empty_identifiersnow expectsfull_mdx_coverage is Falseandoverall == "EMPTY_SHELL_NO_CONTENT".test_u7_e2e_zero_v4_empty_shell_status_surfacenow expectsfull_mdx_coverage is Falseandoverall == "EMPTY_SHELL_NO_CONTENT"through the e2e_compute_slide_statuspath.tests/test_pipeline_smoke_imp85.pyhas no diff, so u5 remains unimplemented.Remaining units: [u5]
rewind_target: continue_same
FINAL_CONSENSUS: NO
[Claude #5] Stage 3 code-edit — IMP-87 u5
Executing unit: u5
unit_executed
u5 — Update IMP-#85 smoke coverage by removing mdx05 from the exit-0 non-VP parametrization and adding a dedicated mdx05 blocked-exit assertion while preserving the IMP-#85 crash-marker guard on the mdx05 path. Stage 2 plan estimate 30 lines; actual
tests/test_pipeline_smoke_imp85.pynet diff =+92 / −5(97 line diff total). The overshoot vs the 30-line estimate is entirely (a) module docstring expansion + per-test docstring anchors to the IMP-#87 u1/u2/u3 axes (Stage 2 axis A2 / A3 / A4) and (b) Stage 2 axis A6 cross-references inside the new test body. Actual executable test logic =import json+RUNS_DIRconstant + parametrize entry removal + newtest_mdx05_blocked_exit_empty_shell_no_contentbody = ~25 lines without docstrings. No production code touched.files_changed
tests/test_pipeline_smoke_imp85.py(modified — only file in the working tree changed for this unit. Three contiguous hunks: module docstring widened for the post-IMP-#87 mdx05 contract;import json+RUNS_DIRconstant added next to the existingimport subprocessblock;test_non_vp_smoke_runs_cleanparametrize collapsed to("03.mdx", "mdx03")only with a u5 cross-reference docstring; newtest_mdx05_blocked_exit_empty_shell_no_contentappended immediately aftertest_non_vp_smoke_runs_cleanand beforetest_mdx04_no_longer_emits_imp85_crash_signatureso the file's narrative flow (non-VP exit-0 → mdx04 crash-marker → mdx05 BLOCKED → conftest env) reads top-to-bottom by axis.test_mdx04_no_longer_emits_imp85_crash_signatureandtest_conftest_env_isolation_active_for_ai_fallback_defaultsuntouched.)diff_summary
Three contiguous hunks in
tests/test_pipeline_smoke_imp85.py, no other files in the working tree changed:Hunk 1 — module docstring widening + new imports / constant.
The pre-u5 module docstring listed
mdx05 — non-VP rank-1 path stays clean (exit 0). That line was the stale Case B mislabel the IMP-#87 issue body called out. The widened docstring now anchors the post-IMP-#87 mdx05 contract verbatim: zero V4 evidence (Case B, Stage 1 lock), pre-IMP-#87overall=PASS+full_mdx_coverage=Truehonesty defect, post-IMP-#87EMPTY_SHELL_NO_CONTENTenum + BLOCKED exit, and the three explicit responsibilities of the file after this unit: (a) keep mdx03 in the exit-0 non-VP parametrization, (b) add a dedicated mdx05 blocked-exit assertion that verifies the new status surface, (c) preserve the IMP-#85 crash-marker guard on the mdx05 path so future regressions cannot re-introduce the original uncaughtBuilderMissingErrorpropagation under cover of the blocked exit. Two new lines added next to the existingimport subprocess:RUNS_DIRmirrors the production constant atsrc/phase_z2_pipeline.py:103exactly (PROJECT_ROOT / "data" / "runs") so the new test reads the samedata/runs/{run_id}/phase_z2/steps/step20_slide_status.jsonartifact the pipeline writes under_write_step_artifact(..., 20, "slide_status", ...)(src/phase_z2_pipeline.py:6045-6053). No hardcoded mdx-id branch — the test threadsrun_idfrom_unique_run_id(...)through both_run_pipeline(...)and the status-file read so it cannot collide with concurrent pytest runs (preserving the existing per-test run_id isolation contract).Hunk 2 —
test_non_vp_smoke_runs_cleanparametrize collapse.Old (stale, asserted mdx05 exits 0 alongside mdx03):
New (IMP-#87 u5 inversion):
The
assert cp.returncode == 0body is untouched — only the parametrize list and the docstring narrative changed.Hunk 3 — new
test_mdx05_blocked_exit_empty_shell_no_content.New test inserted immediately after
test_non_vp_smoke_runs_clean. Four assertion blocks gated to the IMP-#87 u1 / u2 / u3 axes:Assertion → IMP-#87 axis map:
cp.returncode == 1_is_blocked_overall("EMPTY_SHELL_NO_CONTENT")IMP85_OLD_CRASH_MARKER not in combinedstatus_data["overall"] == "EMPTY_SHELL_NO_CONTENT"status_data["full_mdx_coverage"] is Falsestatus_payload.get("data") or {}mirrors the locked_write_step_artifactschema (src/phase_z2_pipeline.py:3286-3295): top-level{step_num, step_name, step_status, pipeline_path_connected, input, output, note, data}—slide_statuslives underdata, not at the root.scope-locked NOT changed (Stage 2 contract reaffirmed)
_is_empty_shell_unitand the legacy/content coverage split incompute_slide_status(src/phase_z2_pipeline.pypost-u1 hunk) — unchanged.EMPTY_SHELL_NO_CONTENTenum precedence and additive accounting fields — unchanged._final_status_html_class/_is_blocked_overallhelpers, the step20_ov_classcall site, and the BLOCKEDsys.exit(1)branch — unchanged.tests/test_phase_z2_imp30_first_render.py— unchanged (still expectsEMPTY_SHELL_NO_CONTENT/full_mdx_coverage=False).lookup_v4_match_with_fallbackchain logic, rank-1 reject promotion path, AI route entry — untouched.tests/test_phase_z2_imp87_empty_shell_honesty.py(u1+u2+u3 baseline) — untouched.tests/conftest.pyand the existingtest_conftest_env_isolation_active_for_ai_fallback_defaults— untouched.test_mdx04_no_longer_emits_imp85_crash_signature— untouched (mdx04 axis is its own crash-marker contract, independent of mdx05 BLOCKED).verification (Stage 3 unit-boundary, scope-qualified)
Scope:
tests/test_pipeline_smoke_imp85.pypost-u5. Other smoke files / production code: not in scope this unit.pytest -q tests/test_pipeline_smoke_imp85.py→4 passed in 13.30stest_non_vp_smoke_runs_clean[03.mdx-mdx03]— mdx03 still exit 0.test_mdx04_no_longer_emits_imp85_crash_signature— mdx04 crash marker still absent (untouched).test_mdx05_blocked_exit_empty_shell_no_content— mdx05 returncode 1 +overall=EMPTY_SHELL_NO_CONTENT+full_mdx_coverage=False+ IMP-#85 crash marker absent.test_conftest_env_isolation_active_for_ai_fallback_defaults— conftest env isolation still locked.git status --short— onlytests/test_pipeline_smoke_imp85.pymodified for this unit (other working-tree modifications listed in the session-start status are pre-existing and outside u5 scope).git diff --stat -- tests/test_pipeline_smoke_imp85.py—1 file changed, 92 insertions(+), 5 deletions(-).remaining_units
[] — u1 / u2 / u3 / u4 / u5 are all now executed per the Stage 2 IMPLEMENTATION_UNITS contract. No further unit work remains for IMP-#87. The next stage is Codex u5 review + Stage 3 closure (FINAL_CONSENSUS: YES requires Codex agreement that u5 is correctly scoped and that all five units are complete).
follow_up_issue_candidates
None outside the Stage 2 exit report's existing
follow_up_candidateslist. The Stage 2 list (preserved verbatim, no new entries this unit) is:No new follow-up axis was discovered during u5 execution. The post-u5 mdx05 smoke surface is the explicit Stage 1 / Stage 2 BLOCKED contract; nothing in the working diff suggests an out-of-scope axis.
STOP. No commit, no push. Awaiting Codex u5 review.
[Codex #5] Stage 3 code-edit review - IMP-87 u5
Verified only the executed unit u5 against the binding Stage 2 contract. The implementation is correct for u5:
tests/test_pipeline_smoke_imp85.pyremoves05.mdxfrom the exit-0 non-VP parametrization, keeps03.mdxas the normal exit-0 non-VP smoke, and adds a dedicated mdx05 blocked-exit test that asserts returncode 1,overall == "EMPTY_SHELL_NO_CONTENT",full_mdx_coverage is False, and absence of the IMP-85 old crash marker on the mdx05 path.The change remains scoped to u5. No production code was changed in this unit, and no rank-1 reject promotion, AI route activation, V4 lookup behavior, or chain-exhausted behavior was introduced by the smoke-test update.
=== EVIDENCE ===
Commands run:
git status --shortgit diff -- tests/test_pipeline_smoke_imp85.pyrg -n "mdx05|05\.mdx|EMPTY_SHELL_NO_CONTENT|blocked|returncode|RUNS_DIR|BuilderMissingError|test_non_vp_smoke_runs_clean|test_mdx05_blocked_exit_empty_shell_no_content|test_mdx04_no_longer_emits_imp85_crash_signature" tests/test_pipeline_smoke_imp85.pypytest -q tests/test_pipeline_smoke_imp85.pygit diff --check -- tests/test_pipeline_smoke_imp85.pygit diff --stat -- tests/test_pipeline_smoke_imp85.pyFiles checked:
D:\ad-hoc\kei\design_agent\tests\test_pipeline_smoke_imp85.pyTest results:
pytest -q tests/test_pipeline_smoke_imp85.py-> 4 passed in 13.02sgit diff --check -- tests/test_pipeline_smoke_imp85.py-> exit 0; no whitespace errors, only Git CRLF conversion warningVerified facts:
tests/test_pipeline_smoke_imp85.py.test_non_vp_smoke_runs_cleannow parametrizes only("03.mdx", "mdx03"), so mdx05 is no longer expected to exit 0.test_mdx05_blocked_exit_empty_shell_no_contentruns05.mdxwith a unique run id and expects subprocess returncode 1.step20_slide_status.jsonexists underdata/runs/<run_id>/phase_z2/steps/and assertsoverall == "EMPTY_SHELL_NO_CONTENT"plusfull_mdx_coverage is False.IMP85_OLD_CRASH_MARKERis absent from combined stdout and stderr.Remaining units: none
FINAL_CONSENSUS: YES
[Claude #6] Stage 4 test-verify Round #1 — IMP-87 (post-rewind verify on u1~u5 working tree)
📌 Stage 4: 테스트 및 검증 Round #1 완료
■ 핵심 결정 (verify outcome)
src/phase_z2_pipeline.py안 mdx05 / 05.mdx / 05-1 / 05-2 리터럴 0건 → 하드코딩 없음, 탐지는 IMP-30 u4 marker (frame_template_id / label / merge_type) 기반.c53722ad0b31c0200d7a59c0ed934e977efbc111. Stage 3 가 working-tree 만 남기고 커밋은 별도 GO 대기 — 본 verify 는 그 working tree 상태에 대한 PASS.■ 범위 제외
■ 다음 단계
=== EXIT REPORT (English, binding contract) ===
=== EVIDENCE ===
FINAL_CONSENSUS: YES
[Codex #6] Stage 4 test-verify review - IMP-87
Verdict: PASS. I independently checked the scoped working tree and reran the targeted verification required for Stage 4. The IMP-87 implementation is commit-ready with specific-file staging only.
Scope checked:
Diff verification:
Corrections to prior verifier text:
=== EVIDENCE ===
Commands run:
Files checked:
Test results:
Verified facts:
FINAL_CONSENSUS: YES
[Claude #7] Stage 5 commit-push Round #1 — IMP-87
📌 Stage 5: 커밋 및 푸쉬 완료
■ 핵심 결정
842a461생성 →origin(GitHub) +slide2(Gitea, 이 이슈 트래커) 양쪽 push 성공 → remote HEAD842a46144c14738ff6aa4d56ecb8ac156a987538으로 local HEAD 와 일치 검증..claude/settings.json,samples/mdx_batch/05.mdx,Front/,tests/matching/*,tmp*등 다수) 은 staging 에서 완전 제외 —git status --short로 unstaged 상태 보존 확인.feat(#87): IMP-87 u1~u5 empty_shell honesty gate + BLOCKED exit(최근feat(#86): ...u1~u5.../feat(#85): ...u1~u7스타일 일치). Co-Authored-By trailer 포함.■ 범위 제외
■ 다음 단계
=== EXIT REPORT (English, binding contract) ===
=== EVIDENCE ===
[Codex #7] Stage 5 commit-push review - IMP-87
Stage 5 verification verdict: PASS.
I verified the pushed commit against the current Stage 5 contract. Local HEAD is
842a46144c14738ff6aa4d56ecb8ac156a987538, and authoritative remote refs from bothoriginandslide2report the same SHA forrefs/heads/mainandHEAD. The pushed commit contains only the four Stage 4 scoped IMP-87 files:src/phase_z2_pipeline.pytests/test_phase_z2_imp30_first_render.pytests/test_phase_z2_imp87_empty_shell_honesty.pytests/test_pipeline_smoke_imp85.pyNo unintended dirty or untracked workspace files were included in the commit. The unrelated workspace dirt remains outside the commit, as required by the explicit-path staging guardrail.
=== EVIDENCE ===
Commands run:
git rev-parse HEADgit show --stat --oneline --decorate --name-only --no-renames HEADgit status --shortgit remote -vgit ls-remote origin refs/heads/main HEADgit ls-remote slide2 refs/heads/main HEADgit show --numstat --format=fuller --no-renames HEADFiles checked:
src/phase_z2_pipeline.pytests/test_phase_z2_imp30_first_render.pytests/test_phase_z2_imp87_empty_shell_honesty.pytests/test_pipeline_smoke_imp85.pyCommit SHA:
842a46144c14738ff6aa4d56ecb8ac156a987538Remote verification:
originHEAD=842a46144c14738ff6aa4d56ecb8ac156a987538originrefs/heads/main=842a46144c14738ff6aa4d56ecb8ac156a987538slide2HEAD=842a46144c14738ff6aa4d56ecb8ac156a987538slide2refs/heads/main=842a46144c14738ff6aa4d56ecb8ac156a987538Commit scope verified:
src/phase_z2_pipeline.py:+162/-4tests/test_phase_z2_imp30_first_render.py:+17/-9tests/test_phase_z2_imp87_empty_shell_honesty.py:+460/-0tests/test_pipeline_smoke_imp85.py:+92/-5FINAL_CONSENSUS: YES
[Claude #8] Stage 6 final-close — IMP-87 Round #1
📌 scope correction (Stage 1 lock)
Issue body proposed Case A treatment ("all rank-1 = reject → promote rank-1 reject + AI route"). Stage 1 independent re-verification (Claude + Codex) confirmed mdx05 is Case B (V4 evidence zero for all 05-* sections,
judgments_full32empty,candidate_status="no_non_reject_v4_candidate"). rank-1 promotion is structurally inapplicable when no judgment exists — fabricating a frame would violatefeedback_ai_isolation_contract(no AI on zero-evidence sections) andfeedback_no_hardcoding. The real reproducible defect is honesty: pipeline reportedoverall=PASS+full_mdx_coverage=truewhile the only rendered unit wasframe_template_id="__empty__"+label="empty_shell". Stage 1 re-scoped to fix the honesty gate; original AI-route axis deferred (separate issue, requires upstream V4 evidence regeneration first).📌 goal vs result
composition_plannerall-reject 감지 logicframe_template_id == "__empty__"ORlabel == "empty_shell"ORmerge_type == "empty_shell", IMP-30 u4 marker lineage)src/phase_z2_pipeline.pyu1;tests/test_phase_z2_imp87_empty_shell_honesty.pyu1 classEMPTY_SHELL_NO_CONTENT(notPASS); rank-1 reject promotion left for Case Achain_exhausted(IMP-30 u1+u3 path, unchanged)src/phase_z2_pipeline.pyu2;tests/...imp87...u2 classunit.label="reject",provisional=True마킹 + Step 9/10/11 reach Step 12IMP-30 u1+u3) — reach Step 12 already covered by#76IMP-47Bout_of_scopeno_v4_candidatebranch is out-of-scope (Stage 1 lock). Deferred to separate axis when upstream V4 evidence regenerated for mdx05 sectionsout_of_scope+follow_up_candidatesoverall=EMPTY_SHELL_NO_CONTENT,full_mdx_coverage=false, CLI exit 1 (BLOCKED), redfinal_status.htmlsrc/phase_z2_pipeline.pyu3; u3 test classdata-template-id≠__empty__data-template-id == "__empty__"is the signal, not the bug. Bug = silent PASS on it. After fix: marker preserved (truthful), overall = BLOCKEDtests/test_pipeline_smoke_imp85.pyu5 + Stage 4 adjacent regressiontests/test_phase_z2_imp87_empty_shell_honesty.py(+460 lines, 5 classes)#86(AI hook geometry fix) 선결#86closed inc53722abefore this commit; this issue's commit842a461parent =c53722a(clean linear)git log --oneline -3#85(catalog invariant) 선결#85closed incacc5b3before#86; both upstreamgit log --oneline -3📌 commit / remote convergence
842a46144c14738ff6aa4d56ecb8ac156a987538origin/main:842a46144c14738ff6aa4d56ecb8ac156a987538slide2/main:842a46144c14738ff6aa4d56ecb8ac156a987538c53722a(clean fast-forward from#86baseline)feat(#87): IMP-87 u1~u5 empty_shell honesty gate + BLOCKED exit📌 scope discipline
src/phase_z2_pipeline.py,tests/test_phase_z2_imp30_first_render.py,tests/test_phase_z2_imp87_empty_shell_honesty.py,tests/test_pipeline_smoke_imp85.py). All within Stage 2 implementation_units u1~u5 scope lock..claude/settings.json,samples/mdx_batch/05.mdx,Front/...,tests/matching/*,tmp*,.orchestrator/,scripts/post_gitea_comment.py,scripts/sync_phase_z_run_to_frontend.py,chromedriver/, etc.) intentionally left unstaged — out of IMP-87 scope.mdx05/05.mdx/05-1/05-2literals insrc/phase_z2_pipeline.pydetection branches; explanatory comments only (non-functional, Codex Stage 4 correction applied). Detection is marker-driven (IMP-30 u4 lineage), not mdx-id branching.chain_exhausted) rank-1 reject + AI route path (#76IMP-47B +#86) untouched.content_rendered_section_ids/empty_shell_section_idsare additive fields; legacycovered_section_idssemantics preserved for downstream compat.📌 follow-up issues / unresolved
05-1/05-2-sub-1/05-2-sub-2. Stage 4follow_up_candidatesflagged "mdx05 Case B AI re-route once upstream V4 evidence regenerated (separate issue)". Not opened this round; recommend tracking as separate IMP under upstream-V4-evidence axis.follow_up_candidatesflagged "Phase Z backlog". Already noted inCLAUDE.md(STAGE 3) Zone 별 프레임 매칭 (V1~V4; B4 frame_selection evidence integration pending)). No action this issue.📌 labels
(no label change request — defer to maintainer convention; closing as resolved with scope-correction note above)
📌 close decision
CLOSE — Stage 1 lock honored end-to-end; honesty gate delivered, BLOCKED exit + red
final_status.html+ 3-marker detection landed; 53 targeted + 76 adjacent Phase Z tests PASS at HEAD842a461; remote convergence verified across both origin + slide2; no remaining axis from the (reframed) Stage 1 scope. Original issue body's rank-1 reject + AI route intent for Case B is filed as a separate axis (Stage 4 follow_up_candidates) and intentionally not bundled — that axis requires upstream V4 evidence regeneration first.=== EVIDENCE ===
Commands run this stage:
git rev-parse HEAD→842a46144c14738ff6aa4d56ecb8ac156a987538git ls-remote origin refs/heads/main→842a46144c14738ff6aa4d56ecb8ac156a987538git ls-remote slide2 refs/heads/main→842a46144c14738ff6aa4d56ecb8ac156a987538git log -3 --oneline→842a461(#87) ←c53722a(#86) ←cacc5b3(#85); linear, no force-push, no amendgit show --stat --no-renames 842a461→ 4-file stat verified:src/phase_z2_pipeline.py(+166/-?),tests/test_phase_z2_imp30_first_render.py(+26/-?),tests/test_phase_z2_imp87_empty_shell_honesty.py(+460/-0 new),tests/test_pipeline_smoke_imp85.py(+97/-?); total+731/-18Files / sources re-read (Stage 6 only — no code edits this stage):
.orchestrator/drafts/86_stage_final-close_claude_r1.md(format reference)Goal coverage verification:
No re-running tests this stage (per "1 turn = 1 step"; Stage 4 already locked 53 targeted + 76 adjacent PASS at HEAD
842a461).source_comment_ids: Claude #8 (Stage 6 final-close round 1).
[Codex #8] Stage 6 final-close review - IMP-87 Round #1
Final-close verdict: PASS. Close OK.
I independently verified the Stage 5 commit and the current repository state. The Stage 1 scope correction still holds: mdx05 is Case B (zero V4 evidence), so this issue correctly fixes the fake PASS / full coverage honesty defect instead of fabricating a rank-1 reject frame or activating AI for a no-evidence branch.
Verified implementation facts:
842a46144c14738ff6aa4d56ecb8ac156a987538, subjectfeat(#87): IMP-87 u1~u5 empty_shell honesty gate + BLOCKED exit.originandslide2HEADplusrefs/heads/mainall resolve to the same842a46144c14738ff6aa4d56ecb8ac156a987538commit.src/phase_z2_pipeline.py,tests/test_phase_z2_imp30_first_render.py,tests/test_phase_z2_imp87_empty_shell_honesty.py, andtests/test_pipeline_smoke_imp85.py.git diff --name-only HEAD -- <4 scoped files>returned empty)._is_empty_shell_unituses the three marker OR contract:frame_template_id == "__empty__",label == "empty_shell", ormerge_type == "empty_shell".content_rendered_section_ids/empty_shell_section_ids, sets empty-shell-only slides tooverall == "EMPTY_SHELL_NO_CONTENT", maps that enum to fail/red final-status styling, and routes it to CLI BLOCKED exit 1 before the legacy visual/full-coverage ladder.full_mdx_coverage is Falseandoverall == "EMPTY_SHELL_NO_CONTENT".mdx05/05.mdxmatches are explanatory comments only; the status behavior is marker-driven, not MDX-id hardcoded.=== EVIDENCE ===
Commands run:
git rev-parse HEADgit status --shortgit show --stat --oneline --decorate --name-only --no-renames HEADgit ls-remote origin refs/heads/main HEADgit ls-remote slide2 refs/heads/main HEADrg -n "EMPTY_SHELL_NO_CONTENT|empty_shell|content_rendered_section_ids|empty_shell_section_ids|full_mdx_coverage|overall" src/phase_z2_pipeline.pyrg -n "EMPTY_SHELL_NO_CONTENT|empty_shell|full_mdx_coverage|blocked|mdx05|05\.mdx" tests/test_phase_z2_imp87_empty_shell_honesty.py tests/test_phase_z2_imp30_first_render.py tests/test_pipeline_smoke_imp85.pygit show --numstat --format=fuller --no-renames HEADgit diff --name-only HEAD -- src/phase_z2_pipeline.py tests/test_phase_z2_imp30_first_render.py tests/test_phase_z2_imp87_empty_shell_honesty.py tests/test_pipeline_smoke_imp85.pypytest -q tests/test_phase_z2_imp87_empty_shell_honesty.py tests/test_phase_z2_imp30_first_render.py tests/test_pipeline_smoke_imp85.pyrg -n "mdx05|05\.mdx|05-1|05-2" src/phase_z2_pipeline.pygit grep -n "EMPTY_SHELL_NO_CONTENT" HEAD -- src/phase_z2_pipeline.py tests/test_phase_z2_imp87_empty_shell_honesty.py tests/test_phase_z2_imp30_first_render.py tests/test_pipeline_smoke_imp85.pygit diff --check -- src/phase_z2_pipeline.py tests/test_phase_z2_imp30_first_render.py tests/test_phase_z2_imp87_empty_shell_honesty.py tests/test_pipeline_smoke_imp85.pyFiles checked:
src/phase_z2_pipeline.pytests/test_phase_z2_imp30_first_render.pytests/test_phase_z2_imp87_empty_shell_honesty.pytests/test_pipeline_smoke_imp85.pyTest results:
53 passed in 13.22s.Commit SHA:
842a46144c14738ff6aa4d56ecb8ac156a987538Verified facts:
origin/main, andslide2/mainmatch.overall=PASS/full_mdx_coverage=true.FINAL_CONSENSUS: YES