feat(#77): IMP-48 composition planner re-split on all-reject (u1~u9)
Add resplit_all_reject_merges() helper in phase_z2_composition.py that
detects parent_merged / parent_merged_inferred units with label=reject
and rebuilds them as per-section single units using each section's own
rank-1 V4 evidence (no frame swap, MDX raw_content preserved).
Pipeline hook fires once after Step 6 settling chain (u12/u4/empty-shell)
and section_assignment_plan resolution, before Step 6 artifact write.
Guards: beneficial-split rule (>=1 non-reject), coverage equality, layout
cap (>4 abort), max_retry=1, section_assignment_override short-circuit.
Audit: comp_debug["imp48_resplit"] additive payload (applied, split_units,
skipped_units, post_split_unit_count, post_split_layout_preset);
selection_path="resplit_from_merge" telemetry on rebuilt singles;
layout_preset re-derived via select_layout_preset(new_units).
Tests: 39/39 PASS (composition u1~u6: 14 cases; pipeline u7~u9: 25 cases).
Scoped regression 720/6 with 6 failures isolated as pre-existing on
baseline 79f9ea5 (independent of IMP-48). mdx03 golden lock preserved.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -925,3 +925,341 @@ def plan_composition(sections, v4_lookup_fn, v4_label_to_status: dict,
|
||||
}
|
||||
|
||||
return units, preset, debug
|
||||
|
||||
|
||||
# ─── IMP-48 — Re-split All-Reject Merges (#77, Stage 2 / u1~u3) ─────
|
||||
|
||||
def resplit_all_reject_merges(
|
||||
units: list[CompositionUnit],
|
||||
sections,
|
||||
v4_lookup_fn,
|
||||
v4_label_to_status: dict,
|
||||
allowed_statuses: set[str],
|
||||
*,
|
||||
capacity_fit_fn=None,
|
||||
v4_candidates_lookup_fn=None,
|
||||
section_assignment_override: bool = False,
|
||||
) -> tuple[list[CompositionUnit], dict]:
|
||||
"""Re-split merged composition units whose rank-1 V4 label is ``reject``.
|
||||
|
||||
IMP-48 (#77) — Step 6 post-pass that decomposes a merged unit
|
||||
(``parent_merged`` / ``parent_merged_inferred``) carrying ``label=reject``
|
||||
into per-section singles, so child sections with non-reject rank-1 V4
|
||||
evidence can flow through the normal use_as_is / light_edit / restructure
|
||||
paths instead of being handed to IMP-47B (#76) as a single blob.
|
||||
|
||||
Stage 2 / u3 slice (current revision) :
|
||||
u1 contract (detection scan + override skip + idempotent single-
|
||||
exclusion) + u2 per-section Branch-1 rebuild (each rebuilt single
|
||||
carries ``merge_type="single"`` + the section's OWN rank-1 V4
|
||||
evidence via ``v4_lookup_fn`` + the section's original
|
||||
``raw_content`` from ``sections``) are both preserved. u3 adds the
|
||||
gating + swap path :
|
||||
|
||||
1. **Coverage equality** — every child section in
|
||||
``source_section_ids`` MUST rebuild successfully. Any
|
||||
``section_not_found`` / ``no_v4_match`` rebuild result short-
|
||||
circuits that merged unit to ``reason="incomplete_rebuild"``.
|
||||
2. **Beneficial split** — at least one rebuilt single MUST have
|
||||
``label != "reject"`` (Stage 2 Q2 Codex YES — "≥1 section
|
||||
gains non-reject frame"). Otherwise that merged unit short-
|
||||
circuits to ``reason="no_beneficial_split"`` and IMP-47B (#76)
|
||||
handles the merge directly.
|
||||
3. **Layout cap (≤ 4 units)** — projected post-split unit count
|
||||
(across ALL detected merges that would split) MUST be ≤ 4.
|
||||
Otherwise EVERY would-be split is aborted with
|
||||
``reason="layout_cap_exceeded"`` (Stage 2 Q2 default — keep
|
||||
merged, no partial split; v0 ``select_layout_preset`` supports
|
||||
1~4 units max).
|
||||
4. **Telemetry** — every single produced by an APPLIED split has
|
||||
``selection_path="resplit_from_merge"`` (Stage 1 Q3 YES,
|
||||
additive field reuse — no schema add).
|
||||
5. **Audit payload** — ``audit["applied"]`` reflects whether ANY
|
||||
merge actually split. ``audit["split_units"]`` /
|
||||
``audit["skipped_units"]`` capture per-merge decisions.
|
||||
``audit["post_split_unit_count"]`` reflects the returned list
|
||||
length. ``audit["post_split_layout_preset"]`` is filled via
|
||||
``select_layout_preset(out_units)`` when ``applied=True``,
|
||||
None otherwise (u5 also re-derives in pipeline scope).
|
||||
|
||||
``out_units`` is the post-resplit unit list (merged removed +
|
||||
singles inserted, in original ordering). When no merge splits,
|
||||
``out_units`` is byte-identical to input ``units`` and
|
||||
``applied=False`` — the audit's ``skipped_reason`` becomes
|
||||
``"no_split_applied"``.
|
||||
|
||||
Detection signal (★ no-hardcoding, AI=0) :
|
||||
``merge_type ∈ {"parent_merged", "parent_merged_inferred"}``
|
||||
AND ``label == "reject"``
|
||||
AND ``len(source_section_ids) >= 2``
|
||||
|
||||
Signal uses only ``merge_type`` + ``label`` + section count — never
|
||||
section_id, template_id, MDX filename, or sample identifier.
|
||||
|
||||
Override skip (Stage 2 Q1 — kwarg per Codex YES) :
|
||||
``section_assignment_override=True`` makes the helper a no-op. User-
|
||||
driven ``zoneSections`` (#6 IMP-06) is the ground truth and must not
|
||||
be second-guessed by an automatic re-split.
|
||||
|
||||
Idempotency (max_retry=1, Stage 2 lock) :
|
||||
u2's rebuilt units carry ``merge_type="single"``, which is excluded
|
||||
from the detection filter by construction. A second pass through
|
||||
this helper finds nothing — no inner loop, no recursion.
|
||||
|
||||
Frame-swap guardrail (★ feedback_ai_isolation_contract) :
|
||||
u2 rebuilds each child section's single from its OWN rank-1 V4
|
||||
evidence via ``v4_lookup_fn``. The merged unit's parent /
|
||||
representative ``template_id`` is discarded along with the merge
|
||||
itself — no swap of one section's frame onto another section.
|
||||
|
||||
Args:
|
||||
units: composition units from ``plan_composition()``.
|
||||
sections: original section list (forwarded to u2 for per-section
|
||||
``raw_content`` lookup — merged units carry the joined string,
|
||||
not the individual child source).
|
||||
v4_lookup_fn: ``(section_id) -> V4Match | None`` (rank-1). Forwarded
|
||||
to u2 — identical evidence source as ``plan_composition``.
|
||||
v4_label_to_status: V4 label → Phase Z status mapping (forwarded).
|
||||
allowed_statuses: auto-renderable status set (forwarded).
|
||||
capacity_fit_fn: optional capacity fit injector (forwarded to u2).
|
||||
v4_candidates_lookup_fn: optional Step 6-A candidates fn (forwarded).
|
||||
section_assignment_override: True iff user supplied
|
||||
``zoneSections`` / ``section_assignment_plan`` (IMP-06 chain).
|
||||
|
||||
Returns:
|
||||
``(out_units, audit)`` :
|
||||
``out_units`` = post-resplit units (u1: identical to input).
|
||||
``audit`` = ``imp48_resplit`` payload following Stage 1 schema::
|
||||
|
||||
{
|
||||
"applied": bool, # u1: always False
|
||||
"split_units": [...], # u3 fills with per-section singles
|
||||
"skipped_units": [...], # u3 fills with kept-merged + reason
|
||||
"post_split_unit_count": int,
|
||||
"post_split_layout_preset": Optional[str],
|
||||
"skipped_reason": str, # u1: contract-stage reason
|
||||
"detected_units": [...], # u1: u2's rebuild targets
|
||||
}
|
||||
"""
|
||||
# ``allowed_statuses`` is forwarded for signature symmetry with
|
||||
# ``plan_composition`` but unused inside the helper — Stage 2 / Codex YES
|
||||
# fixed the beneficial-split threshold to ``single.label != "reject"``
|
||||
# (Stage 1 contract "non-reject rank-1"). Future axes may widen the
|
||||
# threshold using ``allowed_statuses``; until then the parameter is
|
||||
# explicitly deleted to silence lint without losing the public contract.
|
||||
del allowed_statuses
|
||||
|
||||
audit: dict = {
|
||||
"applied": False,
|
||||
"split_units": [],
|
||||
"skipped_units": [],
|
||||
"post_split_unit_count": len(units),
|
||||
"post_split_layout_preset": None,
|
||||
"detected_units": [],
|
||||
"rebuild_attempts": [],
|
||||
}
|
||||
|
||||
if section_assignment_override:
|
||||
audit["skipped_reason"] = "section_assignment_override"
|
||||
return units, audit
|
||||
|
||||
detected = [
|
||||
u for u in units
|
||||
if u.merge_type in {"parent_merged", "parent_merged_inferred"}
|
||||
and u.label == "reject"
|
||||
and len(u.source_section_ids) >= 2
|
||||
]
|
||||
audit["detected_units"] = [
|
||||
{
|
||||
"source_section_ids": list(u.source_section_ids),
|
||||
"merge_type": u.merge_type,
|
||||
"template_id": u.frame_template_id,
|
||||
"label": u.label,
|
||||
}
|
||||
for u in detected
|
||||
]
|
||||
if not detected:
|
||||
audit["skipped_reason"] = "no_detection"
|
||||
return units, audit
|
||||
|
||||
# u2 — per-section Branch-1 rebuild for each detected merged-reject unit.
|
||||
# Mirrors ``collect_candidates`` Branch 1 (single per section). Each rebuilt
|
||||
# single carries the section's OWN rank-1 V4 evidence — the merged unit's
|
||||
# parent/representative template_id is discarded along with the merge.
|
||||
# ★ feedback_ai_isolation_contract : no frame swap (each section's own V4).
|
||||
# ★ MDX_raw_content_invariant : raw_content taken from sections list.
|
||||
# ★ idempotency : merge_type="single" excludes singles
|
||||
# from re-detection on any later pass.
|
||||
section_by_id = {s.section_id: s for s in sections}
|
||||
|
||||
def _v4_cands(section_id: str) -> list:
|
||||
return v4_candidates_lookup_fn(section_id) if v4_candidates_lookup_fn else []
|
||||
|
||||
rebuild_attempts: list[dict] = []
|
||||
for merged_unit in detected:
|
||||
section_singles: list[dict] = []
|
||||
for sid in merged_unit.source_section_ids:
|
||||
section = section_by_id.get(sid)
|
||||
if section is None:
|
||||
section_singles.append({
|
||||
"section_id": sid,
|
||||
"build_result": "section_not_found",
|
||||
"unit": None,
|
||||
})
|
||||
continue
|
||||
match = v4_lookup_fn(sid)
|
||||
if match is None:
|
||||
section_singles.append({
|
||||
"section_id": sid,
|
||||
"build_result": "no_v4_match",
|
||||
"unit": None,
|
||||
})
|
||||
continue
|
||||
single = CompositionUnit(
|
||||
source_section_ids=[sid],
|
||||
merge_type="single",
|
||||
frame_template_id=match.template_id,
|
||||
frame_id=match.frame_id,
|
||||
frame_number=match.frame_number,
|
||||
confidence=match.confidence,
|
||||
label=match.label,
|
||||
phase_z_status=v4_label_to_status.get(match.label, "unknown"),
|
||||
v4_rank=getattr(match, "v4_rank", None),
|
||||
selection_path=getattr(match, "selection_path", "rank_1"),
|
||||
fallback_reason=getattr(match, "fallback_reason", None),
|
||||
raw_content=section.raw_content,
|
||||
title=section.title,
|
||||
v4_candidates=_v4_cands(sid),
|
||||
provisional=getattr(match, "provisional", False),
|
||||
)
|
||||
_apply_capacity_fit(single, capacity_fit_fn)
|
||||
score_candidate(single)
|
||||
section_singles.append({
|
||||
"section_id": sid,
|
||||
"build_result": "ok",
|
||||
"unit": single,
|
||||
})
|
||||
rebuild_attempts.append({
|
||||
"merged_source_section_ids": list(merged_unit.source_section_ids),
|
||||
"merged_merge_type": merged_unit.merge_type,
|
||||
"merged_template_id": merged_unit.frame_template_id,
|
||||
"section_singles": section_singles,
|
||||
})
|
||||
|
||||
audit["rebuild_attempts"] = rebuild_attempts
|
||||
|
||||
# u3 — gating + swap path.
|
||||
# Per-merge decision: split | skip(reason). Then a cumulative layout-cap
|
||||
# check aborts ALL would-be splits if projected post-split count > 4
|
||||
# (Stage 2 Q2 default — keep merged, no partial split; v0
|
||||
# ``select_layout_preset`` supports 1~4 units max).
|
||||
plans: list[dict] = []
|
||||
for merged_unit, attempt in zip(detected, rebuild_attempts):
|
||||
required_sids = set(merged_unit.source_section_ids)
|
||||
built_sids = {
|
||||
entry["section_id"]
|
||||
for entry in attempt["section_singles"]
|
||||
if entry["build_result"] == "ok"
|
||||
}
|
||||
if built_sids != required_sids:
|
||||
# Some sections failed to rebuild — coverage equality violated.
|
||||
# IMP-47B (#76) will handle the merged unit directly.
|
||||
plans.append({
|
||||
"merged": merged_unit,
|
||||
"decision": "skip",
|
||||
"reason": "incomplete_rebuild",
|
||||
"missing": sorted(required_sids - built_sids),
|
||||
})
|
||||
continue
|
||||
built_units = [
|
||||
entry["unit"]
|
||||
for entry in attempt["section_singles"]
|
||||
if entry["build_result"] == "ok"
|
||||
]
|
||||
non_reject_count = sum(1 for u in built_units if u.label != "reject")
|
||||
if non_reject_count == 0:
|
||||
# No child section gains a non-reject frame — split is not
|
||||
# beneficial. IMP-47B (#76) handles the merge directly.
|
||||
plans.append({
|
||||
"merged": merged_unit,
|
||||
"decision": "skip",
|
||||
"reason": "no_beneficial_split",
|
||||
})
|
||||
continue
|
||||
plans.append({
|
||||
"merged": merged_unit,
|
||||
"decision": "split",
|
||||
"singles": built_units,
|
||||
"non_reject_count": non_reject_count,
|
||||
})
|
||||
|
||||
# Cumulative layout-cap projection across all would-be splits.
|
||||
projected_count = len(units)
|
||||
for plan in plans:
|
||||
if plan["decision"] == "split":
|
||||
projected_count += len(plan["singles"]) - 1
|
||||
if projected_count > 4:
|
||||
for plan in plans:
|
||||
if plan["decision"] == "split":
|
||||
plan["decision"] = "skip"
|
||||
plan["reason"] = "layout_cap_exceeded"
|
||||
plan["projected_count"] = projected_count
|
||||
|
||||
# Build out_units by walking the input list once. Identity match by
|
||||
# ``id(unit)`` keeps the swap deterministic and preserves order.
|
||||
plan_by_unit_id = {id(plan["merged"]): plan for plan in plans}
|
||||
out_units: list[CompositionUnit] = []
|
||||
applied = False
|
||||
for unit in units:
|
||||
plan = plan_by_unit_id.get(id(unit))
|
||||
if plan is None:
|
||||
out_units.append(unit)
|
||||
continue
|
||||
if plan["decision"] == "split":
|
||||
applied = True
|
||||
for single in plan["singles"]:
|
||||
# ★ Stage 1 Q3 YES — additive telemetry tag, no schema add.
|
||||
# Overrides the v4 match's selection_path for split-produced
|
||||
# singles only; non-resplit code paths are unaffected.
|
||||
single.selection_path = "resplit_from_merge"
|
||||
out_units.extend(plan["singles"])
|
||||
audit["split_units"].append({
|
||||
"merged_source_section_ids": list(plan["merged"].source_section_ids),
|
||||
"merged_template_id": plan["merged"].frame_template_id,
|
||||
"non_reject_count": plan["non_reject_count"],
|
||||
"split_singles": [
|
||||
{
|
||||
"section_id": s.source_section_ids[0],
|
||||
"template_id": s.frame_template_id,
|
||||
"label": s.label,
|
||||
"phase_z_status": s.phase_z_status,
|
||||
}
|
||||
for s in plan["singles"]
|
||||
],
|
||||
})
|
||||
else: # skip
|
||||
out_units.append(unit)
|
||||
skip_entry: dict = {
|
||||
"merged_source_section_ids": list(plan["merged"].source_section_ids),
|
||||
"merged_template_id": plan["merged"].frame_template_id,
|
||||
"reason": plan["reason"],
|
||||
}
|
||||
if plan["reason"] == "incomplete_rebuild":
|
||||
skip_entry["missing_section_ids"] = list(plan["missing"])
|
||||
if plan["reason"] == "layout_cap_exceeded":
|
||||
skip_entry["projected_post_split_count"] = plan["projected_count"]
|
||||
audit["skipped_units"].append(skip_entry)
|
||||
|
||||
audit["applied"] = applied
|
||||
audit["post_split_unit_count"] = len(out_units)
|
||||
if applied:
|
||||
# ``select_layout_preset`` is deterministic on unit count (v0).
|
||||
# u5 (pipeline) re-derives layout preset over the same out_units list;
|
||||
# both values stay consistent by construction.
|
||||
audit["post_split_layout_preset"] = select_layout_preset(out_units)
|
||||
audit.pop("skipped_reason", None)
|
||||
else:
|
||||
audit["post_split_layout_preset"] = None
|
||||
audit["skipped_reason"] = "no_split_applied"
|
||||
|
||||
return out_units, audit
|
||||
|
||||
Reference in New Issue
Block a user