feat(#77): IMP-48 composition planner re-split on all-reject (u1~u9)

Add resplit_all_reject_merges() helper in phase_z2_composition.py that detects parent_merged / parent_merged_inferred units with label=reject and rebuilds them as per-section single units using each section's own rank-1 V4 evidence (no frame swap, MDX raw_content preserved). Pipeline hook fires once after Step 6 settling chain (u12/u4/empty-shell) and section_assignment_plan resolution, before Step 6 artifact write. Guards: beneficial-split rule (>=1 non-reject), coverage equality, layout cap (>4 abort), max_retry=1, section_assignment_override short-circuit. Audit: comp_debug["imp48_resplit"] additive payload (applied, split_units, skipped_units, post_split_unit_count, post_split_layout_preset); selection_path="resplit_from_merge" telemetry on rebuilt singles; layout_preset re-derived via select_layout_preset(new_units). Tests: 39/39 PASS (composition u1~u6: 14 cases; pipeline u7~u9: 25 cases). Scoped regression 720/6 with 6 failures isolated as pre-existing on baseline 79f9ea5 (independent of IMP-48). mdx03 golden lock preserved. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 05:00:07 +09:00
parent 79f9ea5c92
commit ee97f4fc78
4 changed files with 2554 additions and 0 deletions
--- a/src/phase_z2_composition.py
+++ b/src/phase_z2_composition.py
@@ -925,3 +925,341 @@ def plan_composition(sections, v4_lookup_fn, v4_label_to_status: dict,
    }

    return units, preset, debug
+
+
+# ─── IMP-48 — Re-split All-Reject Merges (#77, Stage 2 / u1~u3) ─────
+
+def resplit_all_reject_merges(
+    units: list[CompositionUnit],
+    sections,
+    v4_lookup_fn,
+    v4_label_to_status: dict,
+    allowed_statuses: set[str],
+    *,
+    capacity_fit_fn=None,
+    v4_candidates_lookup_fn=None,
+    section_assignment_override: bool = False,
+) -> tuple[list[CompositionUnit], dict]:
+    """Re-split merged composition units whose rank-1 V4 label is ``reject``.
+
+    IMP-48 (#77) — Step 6 post-pass that decomposes a merged unit
+    (``parent_merged`` / ``parent_merged_inferred``) carrying ``label=reject``
+    into per-section singles, so child sections with non-reject rank-1 V4
+    evidence can flow through the normal use_as_is / light_edit / restructure
+    paths instead of being handed to IMP-47B (#76) as a single blob.
+
+    Stage 2 / u3 slice (current revision) :
+        u1 contract (detection scan + override skip + idempotent single-
+        exclusion) + u2 per-section Branch-1 rebuild (each rebuilt single
+        carries ``merge_type="single"`` + the section's OWN rank-1 V4
+        evidence via ``v4_lookup_fn`` + the section's original
+        ``raw_content`` from ``sections``) are both preserved. u3 adds the
+        gating + swap path :
+
+          1. **Coverage equality** — every child section in
+             ``source_section_ids`` MUST rebuild successfully. Any
+             ``section_not_found`` / ``no_v4_match`` rebuild result short-
+             circuits that merged unit to ``reason="incomplete_rebuild"``.
+          2. **Beneficial split** — at least one rebuilt single MUST have
+             ``label != "reject"`` (Stage 2 Q2 Codex YES — "≥1 section
+             gains non-reject frame"). Otherwise that merged unit short-
+             circuits to ``reason="no_beneficial_split"`` and IMP-47B (#76)
+             handles the merge directly.
+          3. **Layout cap (≤ 4 units)** — projected post-split unit count
+             (across ALL detected merges that would split) MUST be ≤ 4.
+             Otherwise EVERY would-be split is aborted with
+             ``reason="layout_cap_exceeded"`` (Stage 2 Q2 default — keep
+             merged, no partial split; v0 ``select_layout_preset`` supports
+             1~4 units max).
+          4. **Telemetry** — every single produced by an APPLIED split has
+             ``selection_path="resplit_from_merge"`` (Stage 1 Q3 YES,
+             additive field reuse — no schema add).
+          5. **Audit payload** — ``audit["applied"]`` reflects whether ANY
+             merge actually split. ``audit["split_units"]`` /
+             ``audit["skipped_units"]`` capture per-merge decisions.
+             ``audit["post_split_unit_count"]`` reflects the returned list
+             length. ``audit["post_split_layout_preset"]`` is filled via
+             ``select_layout_preset(out_units)`` when ``applied=True``,
+             None otherwise (u5 also re-derives in pipeline scope).
+
+        ``out_units`` is the post-resplit unit list (merged removed +
+        singles inserted, in original ordering). When no merge splits,
+        ``out_units`` is byte-identical to input ``units`` and
+        ``applied=False`` — the audit's ``skipped_reason`` becomes
+        ``"no_split_applied"``.
+
+    Detection signal (★ no-hardcoding, AI=0) :
+        ``merge_type ∈ {"parent_merged", "parent_merged_inferred"}``
+        AND ``label == "reject"``
+        AND ``len(source_section_ids) >= 2``
+
+        Signal uses only ``merge_type`` + ``label`` + section count — never
+        section_id, template_id, MDX filename, or sample identifier.
+
+    Override skip (Stage 2 Q1 — kwarg per Codex YES) :
+        ``section_assignment_override=True`` makes the helper a no-op. User-
+        driven ``zoneSections`` (#6 IMP-06) is the ground truth and must not
+        be second-guessed by an automatic re-split.
+
+    Idempotency (max_retry=1, Stage 2 lock) :
+        u2's rebuilt units carry ``merge_type="single"``, which is excluded
+        from the detection filter by construction. A second pass through
+        this helper finds nothing — no inner loop, no recursion.
+
+    Frame-swap guardrail (★ feedback_ai_isolation_contract) :
+        u2 rebuilds each child section's single from its OWN rank-1 V4
+        evidence via ``v4_lookup_fn``. The merged unit's parent /
+        representative ``template_id`` is discarded along with the merge
+        itself — no swap of one section's frame onto another section.
+
+    Args:
+        units: composition units from ``plan_composition()``.
+        sections: original section list (forwarded to u2 for per-section
+            ``raw_content`` lookup — merged units carry the joined string,
+            not the individual child source).
+        v4_lookup_fn: ``(section_id) -> V4Match | None`` (rank-1). Forwarded
+            to u2 — identical evidence source as ``plan_composition``.
+        v4_label_to_status: V4 label → Phase Z status mapping (forwarded).
+        allowed_statuses: auto-renderable status set (forwarded).
+        capacity_fit_fn: optional capacity fit injector (forwarded to u2).
+        v4_candidates_lookup_fn: optional Step 6-A candidates fn (forwarded).
+        section_assignment_override: True iff user supplied
+            ``zoneSections`` / ``section_assignment_plan`` (IMP-06 chain).
+
+    Returns:
+        ``(out_units, audit)`` :
+            ``out_units`` = post-resplit units (u1: identical to input).
+            ``audit`` = ``imp48_resplit`` payload following Stage 1 schema::
+
+                {
+                    "applied": bool,             # u1: always False
+                    "split_units": [...],        # u3 fills with per-section singles
+                    "skipped_units": [...],      # u3 fills with kept-merged + reason
+                    "post_split_unit_count": int,
+                    "post_split_layout_preset": Optional[str],
+                    "skipped_reason": str,       # u1: contract-stage reason
+                    "detected_units": [...],     # u1: u2's rebuild targets
+                }
+    """
+    # ``allowed_statuses`` is forwarded for signature symmetry with
+    # ``plan_composition`` but unused inside the helper — Stage 2 / Codex YES
+    # fixed the beneficial-split threshold to ``single.label != "reject"``
+    # (Stage 1 contract "non-reject rank-1"). Future axes may widen the
+    # threshold using ``allowed_statuses``; until then the parameter is
+    # explicitly deleted to silence lint without losing the public contract.
+    del allowed_statuses
+
+    audit: dict = {
+        "applied": False,
+        "split_units": [],
+        "skipped_units": [],
+        "post_split_unit_count": len(units),
+        "post_split_layout_preset": None,
+        "detected_units": [],
+        "rebuild_attempts": [],
+    }
+
+    if section_assignment_override:
+        audit["skipped_reason"] = "section_assignment_override"
+        return units, audit
+
+    detected = [
+        u for u in units
+        if u.merge_type in {"parent_merged", "parent_merged_inferred"}
+        and u.label == "reject"
+        and len(u.source_section_ids) >= 2
+    ]
+    audit["detected_units"] = [
+        {
+            "source_section_ids": list(u.source_section_ids),
+            "merge_type": u.merge_type,
+            "template_id": u.frame_template_id,
+            "label": u.label,
+        }
+        for u in detected
+    ]
+    if not detected:
+        audit["skipped_reason"] = "no_detection"
+        return units, audit
+
+    # u2 — per-section Branch-1 rebuild for each detected merged-reject unit.
+    # Mirrors ``collect_candidates`` Branch 1 (single per section). Each rebuilt
+    # single carries the section's OWN rank-1 V4 evidence — the merged unit's
+    # parent/representative template_id is discarded along with the merge.
+    # ★ feedback_ai_isolation_contract : no frame swap (each section's own V4).
+    # ★ MDX_raw_content_invariant     : raw_content taken from sections list.
+    # ★ idempotency                   : merge_type="single" excludes singles
+    #                                   from re-detection on any later pass.
+    section_by_id = {s.section_id: s for s in sections}
+
+    def _v4_cands(section_id: str) -> list:
+        return v4_candidates_lookup_fn(section_id) if v4_candidates_lookup_fn else []
+
+    rebuild_attempts: list[dict] = []
+    for merged_unit in detected:
+        section_singles: list[dict] = []
+        for sid in merged_unit.source_section_ids:
+            section = section_by_id.get(sid)
+            if section is None:
+                section_singles.append({
+                    "section_id": sid,
+                    "build_result": "section_not_found",
+                    "unit": None,
+                })
+                continue
+            match = v4_lookup_fn(sid)
+            if match is None:
+                section_singles.append({
+                    "section_id": sid,
+                    "build_result": "no_v4_match",
+                    "unit": None,
+                })
+                continue
+            single = CompositionUnit(
+                source_section_ids=[sid],
+                merge_type="single",
+                frame_template_id=match.template_id,
+                frame_id=match.frame_id,
+                frame_number=match.frame_number,
+                confidence=match.confidence,
+                label=match.label,
+                phase_z_status=v4_label_to_status.get(match.label, "unknown"),
+                v4_rank=getattr(match, "v4_rank", None),
+                selection_path=getattr(match, "selection_path", "rank_1"),
+                fallback_reason=getattr(match, "fallback_reason", None),
+                raw_content=section.raw_content,
+                title=section.title,
+                v4_candidates=_v4_cands(sid),
+                provisional=getattr(match, "provisional", False),
+            )
+            _apply_capacity_fit(single, capacity_fit_fn)
+            score_candidate(single)
+            section_singles.append({
+                "section_id": sid,
+                "build_result": "ok",
+                "unit": single,
+            })
+        rebuild_attempts.append({
+            "merged_source_section_ids": list(merged_unit.source_section_ids),
+            "merged_merge_type": merged_unit.merge_type,
+            "merged_template_id": merged_unit.frame_template_id,
+            "section_singles": section_singles,
+        })
+
+    audit["rebuild_attempts"] = rebuild_attempts
+
+    # u3 — gating + swap path.
+    # Per-merge decision: split | skip(reason). Then a cumulative layout-cap
+    # check aborts ALL would-be splits if projected post-split count > 4
+    # (Stage 2 Q2 default — keep merged, no partial split; v0
+    # ``select_layout_preset`` supports 1~4 units max).
+    plans: list[dict] = []
+    for merged_unit, attempt in zip(detected, rebuild_attempts):
+        required_sids = set(merged_unit.source_section_ids)
+        built_sids = {
+            entry["section_id"]
+            for entry in attempt["section_singles"]
+            if entry["build_result"] == "ok"
+        }
+        if built_sids != required_sids:
+            # Some sections failed to rebuild — coverage equality violated.
+            # IMP-47B (#76) will handle the merged unit directly.
+            plans.append({
+                "merged": merged_unit,
+                "decision": "skip",
+                "reason": "incomplete_rebuild",
+                "missing": sorted(required_sids - built_sids),
+            })
+            continue
+        built_units = [
+            entry["unit"]
+            for entry in attempt["section_singles"]
+            if entry["build_result"] == "ok"
+        ]
+        non_reject_count = sum(1 for u in built_units if u.label != "reject")
+        if non_reject_count == 0:
+            # No child section gains a non-reject frame — split is not
+            # beneficial. IMP-47B (#76) handles the merge directly.
+            plans.append({
+                "merged": merged_unit,
+                "decision": "skip",
+                "reason": "no_beneficial_split",
+            })
+            continue
+        plans.append({
+            "merged": merged_unit,
+            "decision": "split",
+            "singles": built_units,
+            "non_reject_count": non_reject_count,
+        })
+
+    # Cumulative layout-cap projection across all would-be splits.
+    projected_count = len(units)
+    for plan in plans:
+        if plan["decision"] == "split":
+            projected_count += len(plan["singles"]) - 1
+    if projected_count > 4:
+        for plan in plans:
+            if plan["decision"] == "split":
+                plan["decision"] = "skip"
+                plan["reason"] = "layout_cap_exceeded"
+                plan["projected_count"] = projected_count
+
+    # Build out_units by walking the input list once. Identity match by
+    # ``id(unit)`` keeps the swap deterministic and preserves order.
+    plan_by_unit_id = {id(plan["merged"]): plan for plan in plans}
+    out_units: list[CompositionUnit] = []
+    applied = False
+    for unit in units:
+        plan = plan_by_unit_id.get(id(unit))
+        if plan is None:
+            out_units.append(unit)
+            continue
+        if plan["decision"] == "split":
+            applied = True
+            for single in plan["singles"]:
+                # ★ Stage 1 Q3 YES — additive telemetry tag, no schema add.
+                # Overrides the v4 match's selection_path for split-produced
+                # singles only; non-resplit code paths are unaffected.
+                single.selection_path = "resplit_from_merge"
+            out_units.extend(plan["singles"])
+            audit["split_units"].append({
+                "merged_source_section_ids": list(plan["merged"].source_section_ids),
+                "merged_template_id": plan["merged"].frame_template_id,
+                "non_reject_count": plan["non_reject_count"],
+                "split_singles": [
+                    {
+                        "section_id": s.source_section_ids[0],
+                        "template_id": s.frame_template_id,
+                        "label": s.label,
+                        "phase_z_status": s.phase_z_status,
+                    }
+                    for s in plan["singles"]
+                ],
+            })
+        else:  # skip
+            out_units.append(unit)
+            skip_entry: dict = {
+                "merged_source_section_ids": list(plan["merged"].source_section_ids),
+                "merged_template_id": plan["merged"].frame_template_id,
+                "reason": plan["reason"],
+            }
+            if plan["reason"] == "incomplete_rebuild":
+                skip_entry["missing_section_ids"] = list(plan["missing"])
+            if plan["reason"] == "layout_cap_exceeded":
+                skip_entry["projected_post_split_count"] = plan["projected_count"]
+            audit["skipped_units"].append(skip_entry)
+
+    audit["applied"] = applied
+    audit["post_split_unit_count"] = len(out_units)
+    if applied:
+        # ``select_layout_preset`` is deterministic on unit count (v0).
+        # u5 (pipeline) re-derives layout preset over the same out_units list;
+        # both values stay consistent by construction.
+        audit["post_split_layout_preset"] = select_layout_preset(out_units)
+        audit.pop("skipped_reason", None)
+    else:
+        audit["post_split_layout_preset"] = None
+        audit["skipped_reason"] = "no_split_applied"
+
+    return out_units, audit
--- a/src/phase_z2_pipeline.py
+++ b/src/phase_z2_pipeline.py
@@ -43,6 +43,7 @@ from phase_z2_composition import (
    CompositionUnit,
    derive_parent_id,
    plan_composition,
+    resplit_all_reject_merges,
    select_display_strategy_candidates,
    select_layout_candidates,
    select_region_layout_candidates,
@@ -3966,6 +3967,52 @@ def run_phase_z2_mvp1(
                file=sys.stderr,
            )

+    # IMP-48 (#77) — re-split merged-reject units into per-section singles.
+    # One-shot, deterministic (AI=0) post-pass. Fires AFTER all Step 6 settling
+    # chains (initial plan_composition / u12 mixed admission / u4 provisional
+    # retry / empty-shell) and AFTER section_assignment_plan is known, but
+    # BEFORE the Step 6 artifact write below — so the artifact reflects the
+    # post-resplit unit list. SKIPS when --override-section-assignments is
+    # active (IMP-06 / #6 is the ground truth). Helper guardrails (coverage
+    # equality / beneficial split / layout cap ≤ 4) keep mdx03 byte-identical
+    # (no-op on use_as_is / light_edit slides). u5 re-derives layout_preset
+    # below using the audit payload.
+    units, _imp48_audit = resplit_all_reject_merges(
+        units,
+        sections,
+        lookup_fn,
+        V4_LABEL_TO_PHASE_Z_STATUS,
+        MVP1_ALLOWED_STATUSES,
+        capacity_fit_fn=compute_capacity_fit,
+        v4_candidates_lookup_fn=candidates_lookup_fn,
+        section_assignment_override=section_assignment_plan is not None,
+    )
+    comp_debug["imp48_resplit"] = _imp48_audit
+    # u5 — re-derive layout_preset from helper audit (post-split count via
+    # select_layout_preset(out_units)). Helper guarantees post_split_unit_count
+    # ≤ 4 (layout cap abort), so the derived preset is always renderable by
+    # LAYOUT_PRESETS. Respect --override-layout when present (user's explicit
+    # choice wins over auto-redrive; mirrors the override gate above at L3697).
+    if _imp48_audit.get("applied"):
+        _imp48_post_preset = _imp48_audit.get("post_split_layout_preset")
+        if _imp48_post_preset and not layout_override_applied:
+            if _imp48_post_preset != layout_preset:
+                print(
+                    f"  [IMP-48] layout_preset re-derived: {layout_preset} → "
+                    f"{_imp48_post_preset} (post-split unit count="
+                    f"{_imp48_audit.get('post_split_unit_count')})",
+                    file=sys.stderr,
+                )
+                layout_preset = _imp48_post_preset
+        print(
+            f"  [IMP-48] re-split applied — "
+            f"split={len(_imp48_audit.get('split_units', []))} "
+            f"skipped={len(_imp48_audit.get('skipped_units', []))} "
+            f"post_count={_imp48_audit.get('post_split_unit_count')} "
+            f"post_preset={_imp48_audit.get('post_split_layout_preset')!r}",
+            file=sys.stderr,
+        )
+
    print(f"  preset  : {layout_preset} ({len(units)} units, composition v0 count-based)")
    for u in units:
        print(f"    unit  : {u.source_section_ids} merge={u.merge_type} → "
@@ -4011,6 +4058,15 @@ def run_phase_z2_mvp1(
                }
                for u in units
            ],
+            # IMP-48 (#77) — re-split audit. Additive field. AI=0 deterministic
+            # one-shot post-pass on Step 6 settling result. applied=True means
+            # ≥1 parent_merged / parent_merged_inferred reject unit was split
+            # into per-section singles; selected_units already reflects the
+            # post-split list. Skipped reasons (incomplete_rebuild /
+            # no_beneficial_split / layout_cap_exceeded) keep the merged unit
+            # for IMP-47B (#76) AI handoff. section_assignment_override skip
+            # honors IMP-06 (#6) zoneSections ground truth.
+            "imp48_resplit": _imp48_audit,
        },
        step_status="done",
        pipeline_path_connected=True,
@@ -4020,6 +4076,11 @@ def run_phase_z2_mvp1(
            "composition v0 count-based — sections → candidates → score → greedy select. "
            "Step 6-A (사용자 lock 2026-05-08): selected_units[i].v4_candidates 추가 "
            "(non-reject max-6 후보 list, candidates[0] = 단일 frame_* 와 일관). "
+            "IMP-48 (#77, 2026-05-22): merged-reject 자동 분리 post-pass — "
+            "parent_merged / parent_merged_inferred + label=reject + ≥2 sections "
+            "→ per-section singles (each own rank-1 V4 evidence + raw_content 보존). "
+            "guardrails: coverage equality / beneficial split (≥1 non-reject) / "
+            "layout cap (≤4 units). imp48_resplit audit additive. "
            "logic 무변 — runtime 결과 동일. Step 9 application_plan input."
        ),
    )