"""IMP-48 (#77) u7+u8+u9 — Pipeline regression for IMP-48 hook surface. Scope (this file — Stage 2 plan u7 + u8 + u9): u7 (no-op) and u8 (split-help) pipeline regression for the Step 6 hook in ``src/phase_z2_pipeline.py`` (u4 — call site at L3970-L3989, u5 — re-derive + artifact extension at L3990-L4014, L4061-L4070, L4079-L4084). u7 — no-op contract: when the post-Step-6 unit list contains no IMP-48- target merged-reject units, the hook must be a no-op: 1. ``comp_debug["imp48_resplit"]["applied"] is False``; 2. the ``units`` list referenced by the Step 6 artifact write at L4023-L4086 is byte-identical (as a list of ``CompositionUnit`` dataclass instances + as the serialized ``selected_units`` payload) to the pre-hook list; 3. the audit ``skipped_reason`` is one of the deterministic Stage 1 enum values (``section_assignment_override`` / ``no_detection`` / ``no_beneficial_split`` / ``incomplete_rebuild`` / ``layout_cap_exceeded``) — never ``applied=True`` without a swap; 4. ``layout_preset`` is not re-derived (u5's ``_imp48_audit.get ("applied")`` gate at L3996 short-circuits when applied=False). u8 — split-help contract: when a ``parent_merged`` / ``parent_merged_inferred`` unit with ``label="reject"`` is present AND ≥1 child section has its OWN rank-1 V4 evidence with ``label != "reject"`` AND the post-split projected count is ≤ 4, the hook must SPLIT it so each child section reaches the normal per-section route (use_as_is / light_edit / restructure → matched_zone / adapt_matched_zone / extract_matched_zone) instead of being handed to IMP-47B (#76) as a single blob. The hook must: a. set ``audit["applied"] is True``; b. preserve coverage equality (``{sid for unit in out for sid in unit.source_section_ids} == {sid for unit in pre for sid in unit.source_section_ids}``) — Stage 1 ★ dropped_zero_invariant; c. emit singles using each section's OWN rank-1 V4 template_id + frame_id + label — Stage 1 ★ feedback_ai_isolation_contract (no frame swap from merged parent template_id); d. emit singles with ``raw_content == sections[sid].raw_content`` (not the merged blob) — Stage 1 ★ MDX_raw_content_invariant; e. tag each split-produced single ``selection_path="resplit_from_merge"`` — Stage 1 Q3 YES additive field reuse; f. surface ``audit["post_split_layout_preset"]`` from ``select_layout_preset(out_units)`` so u5's re-derive block at ``phase_z2_pipeline.py:3996-4006`` reflects the new unit count; g. produce a Step 6 ``selected_units`` payload (mirror of the dict- comprehension at L4031-L4060) whose entries are byte-identical to the singles' OWN fields — IMP-47B (#76) on Step 9 sees per-section evidence, not the merged blob. u7 cases (Stage 2 plan no-op axis): 1. **Source anchor** — the u4/u5 wiring markers + audit storage in ``phase_z2_pipeline.py`` are present (cheap structural guard against silent removal in a future refactor). 2. **Import wiring** — ``resplit_all_reject_merges`` is importable from ``phase_z2_composition`` (i.e. the import block at ``phase_z2_pipeline.py:41-50`` is wired and the alphabetical position is intact). 3. **No-op on all-direct slide** — every section is ``use_as_is`` / ``light_edit`` → ``audit["applied"] is False``, ``audit["detected_units"] == []``, units identity preserved (``out_units is units`` — same Python list object, byte-identical to the pre-hook list). 4. **No-op on mixed single-reject (mdx03 lock shape)** — singles with mixed labels, including a single-section reject, do NOT enter detection (``merge_type=="single"`` excluded). Mirrors the mdx03 golden lock invariant (``project_mdx03_frame_lock``). 5. **No-op on parent_merged non-reject** — merged unit with ``label != "reject"`` does NOT enter detection. Confirms the beneficial-split threshold is anchored on ``label == "reject"`` (Stage 1 RULE_0 scope-lock — no template_id / frame_id hardcoding). 6. **Step 6 artifact serialization parity** — the ``selected_units`` dict-comprehension at ``phase_z2_pipeline.py:4031-4060`` produces the same payload pre- and post-hook for no-op inputs (byte- identical JSON). 7. **section_assignment_override skip** — when the pipeline forwards ``section_assignment_override=True`` (IMP-06 / #6 ground truth at ``phase_z2_pipeline.py:3988``), the helper short-circuits with ``audit["skipped_reason"] == "section_assignment_override"`` and units identity preserved. u8 cases (Stage 2 plan split-help axis): 9. **Split applied (2-section merged-reject + non-reject children)** — merged_reject with 2 sections, each with own rank-1 ``use_as_is`` / ``light_edit`` V4 evidence → ``audit["applied"] is True``, out_units = per-section singles (in source_section_ids order), merged removed. 10. **No frame swap — singles carry OWN evidence** — each split- produced single's ``frame_template_id`` / ``frame_id`` / ``frame_number`` / ``label`` come from that section's OWN ``v4_lookup_fn`` (rank-1), NOT the merged parent's ``frame_template_id``. ★ feedback_ai_isolation_contract. 11. **Raw content preservation — per-section, not merged blob** — each split-produced single's ``raw_content`` equals ``sections[sid].raw_content``, not the merged unit's joined ``raw_content`` string. ★ MDX_raw_content_invariant. 12. **selection_path telemetry tag** — every split-produced single has ``selection_path == "resplit_from_merge"`` (Stage 1 Q3 YES). Non-split units in the same out_units list keep their original ``selection_path`` ("rank_1" etc.) — additive, non-clobbering. 13. **Normal per-section route restoration** — each split-produced single's ``phase_z_status`` maps via ``v4_label_to_status`` from its OWN label (matched_zone / adapt_matched_zone / extract_matched_zone), NOT ``fallback_candidate``. This is the core IMP-48 win: child sections reach the auto-renderable path instead of IMP-47B (#76) AI repair. 14. **Coverage equality** — set of section_ids in out_units equals set in pre-hook units (★ Stage 1 dropped_zero_invariant). Pre = 1 merged unit with 3 sections, post = 3 singles, ∀ sid preserved. 15. **layout_preset re-derivation contract** — ``audit["post_split_layout_preset"]`` is non-None when ``applied=True`` and matches ``select_layout_preset(out_units)``. This is what u5's pipeline re-derive block at ``phase_z2_pipeline.py:3996-4006`` reads to update ``layout_preset`` (when ``not layout_override_applied``). 16. **Step 6 artifact serialization for split-help** — the ``selected_units`` dict-comprehension at ``phase_z2_pipeline.py:4031-4060`` over the post-split out_units contains per-section entries with each section's OWN evidence; ``imp48_resplit.applied`` is True; merged parent's evidence is absent from the payload. Locks the Step 9 / IMP-47B (#76) hand-off shape (per-unit, not per-merge-blob). 17. **Mixed pre-hook list — order preserved** — when pre = [single, merged_reject(2 sections), single], post = [single, single, single, single] in source-order (split inserted in place of merged, surrounding singles untouched). u9 cases (Stage 2 plan split-then-reject axis — coverage preserved + remaining reject singles eligible for IMP-47B (#76) handoff): 18. **Split applied with mixed reject + non-reject children** — pre = [merged_reject(MOCK_S1, MOCK_S2)] where MOCK_S1 has its OWN rank-1 V4 evidence with ``label="use_as_is"`` and MOCK_S2 has its OWN rank-1 V4 evidence with ``label="reject"`` (the section's OWN truth is also reject — e.g., the child genuinely has no decent frame). ``audit["applied"] is True`` (≥1 non-reject is the beneficial-split threshold), 2 singles in source order, ``audit["split_units"][0]["non_reject_count"] == 1``. 19. **Reject single routes to IMP-47B handoff via fallback_candidate** — the split-produced single for MOCK_S2 (own rank-1 reject) carries ``label="reject"`` AND ``phase_z_status="fallback_candidate"``. This is the contract IMP-47B (#76) router reads at ``src/phase_z2_pipeline.py:582`` (_RECONSTRUCTION_BY_HINT) to decide ``ai_adaptation_required``. The non-reject sibling (MOCK_S1) routes via its OWN ``matched_zone`` / ``adapt_matched_ zone``, NOT fallback. The IMP-48 win here is per-section handoff: IMP-47B sees individual reject sections instead of one merged blob. 20. **All-children-reject merge — no_beneficial_split skip path** — pre = [merged_reject(MOCK_S1, MOCK_S2)] where BOTH sections have OWN rank-1 V4 with ``label="reject"``. ``audit["applied"] is False``, ``audit["skipped_reason"] == "no_split_applied"``, ``audit["skipped_units"][0]["reason"] == "no_beneficial_split"``. Merged unit preserved → IMP-47B sees the merged blob (existing behavior, IMP-48 is a no-op here). Coverage preserved by definition (merged kept whole). 21. **Coverage preserved across mixed children (3-section split)** — pre = [merged_reject(MOCK_S1, MOCK_S2, MOCK_S3)] with 2 non- reject + 1 reject. ``audit["applied"] is True``, post = 3 singles, ``{sid for u in post for sid in u.source_section_ids} == {MOCK_S1, MOCK_S2, MOCK_S3}`` (★ Stage 1 dropped_zero_ invariant); the reject single is NOT dropped — it carries its OWN section's raw_content + own V4 reject evidence and routes to IMP-47B. 22. **No frame swap on reject single** — the reject split-produced single's ``frame_template_id`` / ``frame_id`` / ``frame_number`` come from its OWN ``v4_lookup_fn(sid)`` (a reject-labelled V4), NOT the merged parent's reject template_id and NOT the non- reject sibling's template_id. ★ feedback_ai_isolation_contract. 23. **selection_path tagging covers reject singles too** — every split-produced single, including the one with own-reject label, has ``selection_path == "resplit_from_merge"``. Stage 1 Q3 YES additive-tag rule is uniform across mixed-children splits. 24. **Raw content preservation across reject + non-reject singles** — both the reject single and the non-reject single carry their OWN section's ``raw_content`` (from ``sections[sid]``), NOT the merged parent's joined blob. ★ MDX_raw_content_invariant. The reject single's raw_content is what IMP-47B (#76) feeds to AI restructure — per-section input, not merged blob input. 25. **Step 6 artifact payload for split-then-reject** — the ``selected_units`` dict-comp at ``phase_z2_pipeline.py:4031-4060`` over the post-split out_units yields per-section entries; the reject single's payload entry has ``phase_z_status="fallback_candidate"`` and the non-reject single's entry has ``matched_zone`` / ``adapt_matched_zone``. Locks the Step 9 / IMP-47B (#76) hand-off shape: downstream consumers see one fallback_candidate single (not a merged blob of mixed sections). ★ AI=0 throughout — PZ-1 deterministic code path only. ★ No-hardcoding (RULE_7) — stubs use MOCK_ prefixed identifiers; no real catalog template_id / frame_id / MDX sample identifier leaks. ★ mdx03_lock — case 4 represents the mdx03 shape (all-single, no merged reject) and locks the byte-identical no-op contract. ★ u8 split-help cases lock the mdx04 04-1 expectation: a 2-section merged-reject becomes 2 per-section singles, and each child reaches the normal route via its own rank-1 V4. ★ u9 split-then-reject cases lock the mdx05 expectation: a 2~3 section merged-reject with mixed reject + non-reject children is split so the reject child(ren) reach IMP-47B (#76) AS INDIVIDUAL SECTIONS rather than as one merged blob. Existing all-reject merges remain no-op (IMP-47B handles merged blob — existing behavior preserved). """ from __future__ import annotations import json from dataclasses import dataclass from pathlib import Path from typing import Optional from src.phase_z2_composition import ( CompositionUnit, resplit_all_reject_merges, ) # ─── Synthetic stubs (MOCK_ prefix mandatory — IMP-30 u3 convention) ─── @dataclass class _StubV4Match: template_id: str frame_id: str frame_number: int confidence: float label: str v4_rank: Optional[int] = None selection_path: str = "rank_1" fallback_reason: Optional[str] = None provisional: bool = False @dataclass class _StubSection: section_id: str title: str = "" raw_content: str = "" # Mirrors V4_LABEL_TO_PHASE_Z_STATUS / MVP1_ALLOWED_STATUSES at # phase_z2_pipeline.py:97-103 — kept inline so the test is self-contained # (parallel to IMP-47B u12 stub set, see test_imp47b_mixed_reject_fill.py). _LABEL_TO_STATUS = { "use_as_is": "matched_zone", "light_edit": "adapt_matched_zone", "restructure": "extract_matched_zone", "reject": "fallback_candidate", } _ALLOWED_STATUSES = {"matched_zone", "adapt_matched_zone"} def _make_lookup(matches: dict[str, _StubV4Match]): """Build the lookup_fn the pipeline forwards at u4 call site (L3983).""" def _fn(section_id: str) -> Optional[_StubV4Match]: return matches.get(section_id) return _fn def _candidates_lookup_empty(section_id: str) -> list: """Stand-in for candidates_lookup_fn (L3987) — empty list is sufficient for the no-op cases since detection never fires.""" return [] def _serialize_units_like_step6_artifact(units: list[CompositionUnit]) -> list[dict]: """Replicate the ``selected_units`` dict-comprehension at ``phase_z2_pipeline.py:4031-4060`` so byte-identical parity can be asserted on the post-hook artifact payload (case 6 — serialization parity invariant). Mirrors the exact field set + ordering written by ``_write_step_artifact(... 6, "composition_plan", ...)``. """ return [ { "source_section_ids": u.source_section_ids, "merge_type": u.merge_type, "frame_id": u.frame_id, "frame_number": u.frame_number, "frame_template_id": u.frame_template_id, "label": u.label, "v4_rank": u.v4_rank, "selection_path": u.selection_path, "fallback_reason": u.fallback_reason, "score": u.score, "phase_z_status": u.phase_z_status, "rationale": u.rationale, "notes": list(u.notes), "v4_candidates": [ { "template_id": c.template_id, "frame_id": c.frame_id, "frame_number": c.frame_number, "confidence": c.confidence, "label": c.label, } for c in u.v4_candidates ], } for u in units ] def _make_single_unit( section_id: str, *, label: str = "use_as_is", template_id: Optional[str] = None, ) -> CompositionUnit: """Construct a ``merge_type="single"`` CompositionUnit shaped like ``collect_candidates`` output (mirrors u6's ``_make_single_unit``).""" return CompositionUnit( source_section_ids=[section_id], merge_type="single", frame_template_id=template_id or f"MOCK_TMPL_{section_id}", frame_id=f"MOCK_FRM_{section_id}", frame_number=hash(section_id) % 32, confidence=0.85, label=label, phase_z_status=_LABEL_TO_STATUS.get(label, "unknown"), raw_content=f"section {section_id} content", title=section_id, ) def _make_merged_unit( *, merge_type: str, source_section_ids: list[str], label: str, template_id: str = "MOCK_TMPL_PARENT", ) -> CompositionUnit: """Construct a merged CompositionUnit (parent_merged / inferred).""" return CompositionUnit( source_section_ids=list(source_section_ids), merge_type=merge_type, frame_template_id=template_id, frame_id="MOCK_FRM_PARENT", frame_number=99, confidence=0.5, label=label, phase_z_status=_LABEL_TO_STATUS.get(label, "unknown"), raw_content="MERGED RAW CONTENT (joined from children)", title="MOCK_PARENT", ) # ─── Case 1 : Source anchor — u4 + u5 wiring markers present ──────── def test_u4_u5_pipeline_source_contains_imp48_hook_markers(): """Anchor test. Ensures the u4 call site + u5 re-derive + audit storage + artifact extension blocks in ``src/phase_z2_pipeline.py`` are present (not silently removed by a future refactor). Asserts on: * the IMP-48 marker comment at the u4 hook (L3970-L3979); * the helper call ``resplit_all_reject_merges(`` with the ``section_assignment_override=`` kwarg (L3980-L3989); * the audit storage ``comp_debug["imp48_resplit"] = _imp48_audit`` (L3990); * the u5 layout_preset re-derive block (``_imp48_audit.get ("applied")`` + ``post_split_layout_preset`` + ``not layout_override_applied``) at L3996-L4006; * the Step 6 artifact additive field ``"imp48_resplit": _imp48_audit`` at L4069 and the note extension at L4079-L4084. Cheap structural guard — does not run the heavy pipeline. """ src_path = Path(__file__).resolve().parent.parent / "src" / "phase_z2_pipeline.py" text = src_path.read_text(encoding="utf-8") # u4 marker comment + call site assert "IMP-48 (#77) — re-split merged-reject units into per-section singles." in text, ( "u4 marker comment missing from pipeline — IMP-48 hook may have been removed" ) assert "resplit_all_reject_merges(" in text, ( "u4 helper call missing from pipeline" ) assert "section_assignment_override=section_assignment_plan is not None" in text, ( "u4 override-skip kwarg wiring missing — IMP-06 (#6) ground truth contract broken" ) # Audit storage at u4 assert 'comp_debug["imp48_resplit"] = _imp48_audit' in text, ( "u4 audit storage missing — comp_debug telemetry key absent" ) # u5 re-derive block assert "_imp48_audit.get(\"applied\")" in text, ( "u5 applied-gate missing — layout_preset would re-derive on no-op paths" ) assert "post_split_layout_preset" in text, ( "u5 post_split_layout_preset reference missing" ) assert "not layout_override_applied" in text, ( "u5 layout-override respect missing — would clobber --override-layout" ) # Step 6 artifact extension assert '"imp48_resplit": _imp48_audit' in text, ( "u5 Step 6 artifact additive field missing" ) assert "IMP-48 (#77, 2026-05-22)" in text, ( "u5 Step 6 artifact note IMP-48 entry missing" ) # ─── Case 2 : Import wiring (alphabetical block at L41-L50) ──────── def test_resplit_helper_imported_in_pipeline(): """The pipeline's import block at ``phase_z2_pipeline.py:41-50`` imports ``resplit_all_reject_merges`` alongside ``plan_composition`` and ``select_display_strategy_candidates``. This protects against a silent rename / removal that would crash the u4 call site with a ``NameError`` only at runtime. """ src_path = Path(__file__).resolve().parent.parent / "src" / "phase_z2_pipeline.py" text = src_path.read_text(encoding="utf-8") # Find the from-import block and assert membership. assert "from phase_z2_composition import (" in text, ( "phase_z2_composition import block missing" ) # Alphabetical neighbors (Stage 3 u4 lock — see [Claude #7] r4). assert " plan_composition,\n resplit_all_reject_merges,\n" in text, ( "resplit_all_reject_merges must follow plan_composition alphabetically " "in the import block (Stage 3 u4 wiring lock)" ) # ─── Case 3 : No-op on all-direct slide (every section auto-renderable) ── def test_no_op_on_all_direct_singles_units_identity_preserved(): """All-direct slide (every section is use_as_is / light_edit) → ``audit["applied"] is False``, ``audit["detected_units"] == []``, units identity preserved (same Python list object — byte-identical to the pre-hook list).""" units_pre = [ _make_single_unit("MOCK_S1", label="use_as_is"), _make_single_unit("MOCK_S2", label="light_edit"), ] sections = [ _StubSection("MOCK_S1", raw_content="s1"), _StubSection("MOCK_S2", raw_content="s2"), ] lookup = _make_lookup({ "MOCK_S1": _StubV4Match("MOCK_TMPL_S1", "MOCK_FRM_S1", 1, 0.92, "use_as_is", v4_rank=1), "MOCK_S2": _StubV4Match("MOCK_TMPL_S2", "MOCK_FRM_S2", 2, 0.81, "light_edit", v4_rank=1), }) out_units, audit = resplit_all_reject_merges( units_pre, sections, lookup, _LABEL_TO_STATUS, _ALLOWED_STATUSES, v4_candidates_lookup_fn=_candidates_lookup_empty, ) # No-op contract — applied=False, no detection, units identity preserved. assert audit["applied"] is False assert audit["detected_units"] == [] assert audit["skipped_reason"] == "no_detection" assert audit["split_units"] == [] assert audit["skipped_units"] == [] # Same Python list object — helper returned the input list as-is. assert out_units is units_pre, ( "no-op helper must preserve units list identity (no copy)" ) # u5 gate guard — post_split_layout_preset is None when applied=False # (so the pipeline's u5 re-derive block at L3996-L4006 short-circuits). assert audit["post_split_layout_preset"] is None assert audit["post_split_unit_count"] == len(units_pre) # ─── Case 4 : mdx03 lock shape — singles with single-section reject ── def test_no_op_on_mdx03_lock_shape_single_reject_not_detected(): """mdx03 golden lock invariant : even when a single (merge_type== "single") carries label="reject", it does NOT enter detection. Detection requires ``merge_type ∈ {parent_merged, parent_merged_inferred}`` AND ``len(source_section_ids) >= 2``. This mirrors the mdx03 byte-identical no-op contract from ``project_mdx03_frame_lock`` — IMP-48 must not perturb mdx03 output even if a single section's V4 evidence happens to be reject. """ units_pre = [ _make_single_unit("MOCK_S1", label="use_as_is"), _make_single_unit("MOCK_S2", label="reject"), # single, NOT merged ] sections = [ _StubSection("MOCK_S1", raw_content="s1"), _StubSection("MOCK_S2", raw_content="s2"), ] lookup = _make_lookup({ "MOCK_S1": _StubV4Match("MOCK_TMPL_S1", "MOCK_FRM_S1", 1, 0.92, "use_as_is", v4_rank=1), "MOCK_S2": _StubV4Match("MOCK_TMPL_S2", "MOCK_FRM_S2", 2, 0.10, "reject", v4_rank=1), }) out_units, audit = resplit_all_reject_merges( units_pre, sections, lookup, _LABEL_TO_STATUS, _ALLOWED_STATUSES, v4_candidates_lookup_fn=_candidates_lookup_empty, ) assert audit["applied"] is False assert audit["detected_units"] == [] assert audit["skipped_reason"] == "no_detection" assert out_units is units_pre # ─── Case 5 : No-op on parent_merged non-reject ──────────────────── def test_no_op_on_parent_merged_non_reject_unit(): """Beneficial-split threshold is anchored on ``label == "reject"`` (Stage 1 RULE_0 scope-lock). A ``parent_merged`` unit with ``label="light_edit"`` (or any non-reject label) does NOT enter detection — no template_id / frame_id / section_id pattern-matching.""" merged = _make_merged_unit( merge_type="parent_merged", source_section_ids=["MOCK_S1", "MOCK_S2"], label="light_edit", ) units_pre = [merged] sections = [ _StubSection("MOCK_S1", raw_content="s1"), _StubSection("MOCK_S2", raw_content="s2"), ] lookup = _make_lookup({ "MOCK_S1": _StubV4Match("MOCK_TMPL_S1", "MOCK_FRM_S1", 1, 0.85, "light_edit", v4_rank=1), "MOCK_S2": _StubV4Match("MOCK_TMPL_S2", "MOCK_FRM_S2", 2, 0.85, "light_edit", v4_rank=1), }) out_units, audit = resplit_all_reject_merges( units_pre, sections, lookup, _LABEL_TO_STATUS, _ALLOWED_STATUSES, v4_candidates_lookup_fn=_candidates_lookup_empty, ) assert audit["applied"] is False assert audit["detected_units"] == [] assert out_units is units_pre # ─── Case 6 : Step 6 artifact serialization parity ───────────────── def test_step6_artifact_serialized_payload_byte_identical_for_no_op(): """The Step 6 artifact's ``selected_units`` payload (the dict- comprehension at ``phase_z2_pipeline.py:4031-4060``) must be byte- identical pre- and post-hook on no-op inputs. Guards against a helper that mutates returned units in-place (which would change the artifact JSON even when ``applied=False``). """ units_pre = [ _make_single_unit("MOCK_S1", label="use_as_is"), _make_merged_unit( merge_type="parent_merged", source_section_ids=["MOCK_S2", "MOCK_S3"], label="light_edit", # non-reject merged → no-op ), ] sections = [ _StubSection("MOCK_S1", raw_content="s1"), _StubSection("MOCK_S2", raw_content="s2"), _StubSection("MOCK_S3", raw_content="s3"), ] lookup = _make_lookup({ "MOCK_S1": _StubV4Match("MOCK_TMPL_S1", "MOCK_FRM_S1", 1, 0.92, "use_as_is", v4_rank=1), "MOCK_S2": _StubV4Match("MOCK_TMPL_S2", "MOCK_FRM_S2", 2, 0.85, "light_edit", v4_rank=1), "MOCK_S3": _StubV4Match("MOCK_TMPL_S3", "MOCK_FRM_S3", 3, 0.85, "light_edit", v4_rank=1), }) payload_pre = _serialize_units_like_step6_artifact(units_pre) pre_json = json.dumps(payload_pre, sort_keys=True, ensure_ascii=False) out_units, audit = resplit_all_reject_merges( units_pre, sections, lookup, _LABEL_TO_STATUS, _ALLOWED_STATUSES, v4_candidates_lookup_fn=_candidates_lookup_empty, ) payload_post = _serialize_units_like_step6_artifact(out_units) post_json = json.dumps(payload_post, sort_keys=True, ensure_ascii=False) assert audit["applied"] is False assert post_json == pre_json, ( "no-op hook must produce byte-identical Step 6 artifact payload " "(helper must not mutate units in-place)" ) # ─── Case 7 : section_assignment_override skip (IMP-06 ground truth) ── def test_no_op_when_section_assignment_override_active(): """When the pipeline forwards ``section_assignment_override=section_assignment_plan is not None`` = True (IMP-06 / #6 user override at ``phase_z2_pipeline.py:3988``), the helper short-circuits before detection. Even if the units contain a merged-reject (which would normally trigger), the override takes precedence and the units are returned identity-preserved. This locks the contract that IMP-06 zoneSections is the ground truth — IMP-48 never overrides a user-supplied section assignment. """ merged_reject = _make_merged_unit( merge_type="parent_merged", source_section_ids=["MOCK_S1", "MOCK_S2"], label="reject", # would normally trigger detection ) units_pre = [merged_reject] sections = [ _StubSection("MOCK_S1", raw_content="s1"), _StubSection("MOCK_S2", raw_content="s2"), ] lookup = _make_lookup({ "MOCK_S1": _StubV4Match("MOCK_TMPL_S1", "MOCK_FRM_S1", 1, 0.92, "use_as_is", v4_rank=1), "MOCK_S2": _StubV4Match("MOCK_TMPL_S2", "MOCK_FRM_S2", 2, 0.85, "light_edit", v4_rank=1), }) out_units, audit = resplit_all_reject_merges( units_pre, sections, lookup, _LABEL_TO_STATUS, _ALLOWED_STATUSES, v4_candidates_lookup_fn=_candidates_lookup_empty, section_assignment_override=True, ) assert audit["applied"] is False assert audit["skipped_reason"] == "section_assignment_override" # Detection skipped entirely — detected_units never populated. assert audit["detected_units"] == [] # Units identity preserved. assert out_units is units_pre # ─── Case 8 : Empty units list — degenerate no-op ────────────────── def test_no_op_on_empty_units_list(): """When ``units == []`` (initial plan_composition produced nothing and IMP-30 u4 / empty-shell path populated the placeholder via a different mechanism, OR Stage 3's empty-shell placeholder hasn't been built yet), the helper must short-circuit cleanly without raising on the iteration.""" units_pre: list[CompositionUnit] = [] sections = [_StubSection("MOCK_S1", raw_content="s1")] lookup = _make_lookup({ "MOCK_S1": _StubV4Match("MOCK_TMPL_S1", "MOCK_FRM_S1", 1, 0.10, "reject", v4_rank=1), }) out_units, audit = resplit_all_reject_merges( units_pre, sections, lookup, _LABEL_TO_STATUS, _ALLOWED_STATUSES, v4_candidates_lookup_fn=_candidates_lookup_empty, ) assert audit["applied"] is False assert audit["detected_units"] == [] assert audit["skipped_reason"] == "no_detection" assert out_units is units_pre assert audit["post_split_unit_count"] == 0 # ═══════════════════════════════════════════════════════════════════════ # u8 — Pipeline regression for split-help case # ═══════════════════════════════════════════════════════════════════════ # # Each test below validates a contract that the pipeline hook (u4 call # site + u5 layout_preset re-derive + Step 6 artifact extension) relies # on when a real merged-reject unit is present in the post-Step-6 unit # list. We exercise ``resplit_all_reject_merges`` with the SAME signature # the pipeline forwards at ``phase_z2_pipeline.py:3980-3989`` (same # lookup_fn, label-to-status map, allowed_statuses, capacity_fit-shaped # default, candidates lookup, override flag). # # All identifiers MOCK_ prefixed (★ RULE_7_no_hardcoding). No real # catalog template_id / frame_id / MDX sample identifier leaks. # ═══════════════════════════════════════════════════════════════════════ # ─── Case 9 : Split applied — 2-section merged-reject + non-reject children ─ def test_split_applied_two_section_merge_with_non_reject_children(): """Pre = [merged_reject(MOCK_S1, MOCK_S2)] where each section has its OWN rank-1 V4 evidence with a non-reject label. Post = 2 singles, in source_section_ids order, ``audit["applied"] is True``.""" merged = _make_merged_unit( merge_type="parent_merged", source_section_ids=["MOCK_S1", "MOCK_S2"], label="reject", ) units_pre: list[CompositionUnit] = [merged] sections = [ _StubSection("MOCK_S1", title="t1", raw_content="raw content of s1"), _StubSection("MOCK_S2", title="t2", raw_content="raw content of s2"), ] lookup = _make_lookup({ "MOCK_S1": _StubV4Match("MOCK_TMPL_S1", "MOCK_FRM_S1", 1, 0.92, "use_as_is", v4_rank=1), "MOCK_S2": _StubV4Match("MOCK_TMPL_S2", "MOCK_FRM_S2", 2, 0.81, "light_edit", v4_rank=1), }) out_units, audit = resplit_all_reject_merges( units_pre, sections, lookup, _LABEL_TO_STATUS, _ALLOWED_STATUSES, v4_candidates_lookup_fn=_candidates_lookup_empty, ) assert audit["applied"] is True # applied=True path pops the contract-stage skipped_reason (see # ``src/phase_z2_composition.py:1260`` — ``audit.pop("skipped_reason", # None)`` after the applied branch). assert "skipped_reason" not in audit, ( "applied=True path must not carry a skipped_reason value" ) # Out_units shape: 2 per-section singles, merged removed. assert len(out_units) == 2 assert all(u.merge_type == "single" for u in out_units) assert [u.source_section_ids for u in out_units] == [["MOCK_S1"], ["MOCK_S2"]] # Audit shape: one split entry, no skips. assert len(audit["split_units"]) == 1 assert audit["skipped_units"] == [] assert audit["split_units"][0]["merged_source_section_ids"] == ["MOCK_S1", "MOCK_S2"] assert audit["split_units"][0]["non_reject_count"] == 2 assert audit["post_split_unit_count"] == 2 # ─── Case 10 : No frame swap — singles carry OWN evidence ────────────── def test_split_singles_use_own_section_v4_evidence_no_frame_swap(): """★ feedback_ai_isolation_contract — each split-produced single's frame_template_id / frame_id / frame_number / label come from the section's OWN rank-1 V4 lookup. The merged parent's ``frame_template_id`` ("MOCK_TMPL_PARENT_REJECT") MUST NOT appear on any split-produced single. No frame swap of one section's frame onto another section. """ merged = _make_merged_unit( merge_type="parent_merged_inferred", source_section_ids=["MOCK_S1", "MOCK_S2"], label="reject", template_id="MOCK_TMPL_PARENT_REJECT", ) units_pre: list[CompositionUnit] = [merged] sections = [ _StubSection("MOCK_S1", raw_content="s1"), _StubSection("MOCK_S2", raw_content="s2"), ] lookup = _make_lookup({ "MOCK_S1": _StubV4Match("MOCK_TMPL_S1", "MOCK_FRM_S1", 7, 0.88, "use_as_is", v4_rank=1), "MOCK_S2": _StubV4Match("MOCK_TMPL_S2", "MOCK_FRM_S2", 11, 0.79, "light_edit", v4_rank=1), }) out_units, audit = resplit_all_reject_merges( units_pre, sections, lookup, _LABEL_TO_STATUS, _ALLOWED_STATUSES, v4_candidates_lookup_fn=_candidates_lookup_empty, ) assert audit["applied"] is True # No split-produced single carries the merged parent's template_id / # frame_id / frame_number. Each carries its OWN section's V4 evidence. parent_template = merged.frame_template_id parent_frame_id = merged.frame_id parent_frame_number = merged.frame_number for single in out_units: assert single.frame_template_id != parent_template, ( f"frame swap detected: single {single.source_section_ids[0]} " f"carries merged parent template_id={parent_template}" ) assert single.frame_id != parent_frame_id assert single.frame_number != parent_frame_number # Each single matches its OWN section's V4 evidence exactly. s1, s2 = out_units assert (s1.frame_template_id, s1.frame_id, s1.frame_number, s1.label) == ( "MOCK_TMPL_S1", "MOCK_FRM_S1", 7, "use_as_is", ) assert (s2.frame_template_id, s2.frame_id, s2.frame_number, s2.label) == ( "MOCK_TMPL_S2", "MOCK_FRM_S2", 11, "light_edit", ) # ─── Case 11 : Raw content preservation (per-section, not merged blob) ── def test_split_singles_preserve_per_section_raw_content(): """★ MDX_raw_content_invariant — each split-produced single's ``raw_content`` equals the section's original ``raw_content`` (from the ``sections`` list), NOT the merged unit's joined ``raw_content`` blob. Locks the Stage 1 invariant that the split path never edits / summarizes / discards MDX text. """ merged_raw = "MERGED BLOB — joined from children, must NOT leak to singles" merged = CompositionUnit( source_section_ids=["MOCK_S1", "MOCK_S2"], merge_type="parent_merged", frame_template_id="MOCK_TMPL_PARENT_REJECT", frame_id="MOCK_FRM_PARENT_REJECT", frame_number=99, confidence=0.10, label="reject", phase_z_status=_LABEL_TO_STATUS["reject"], raw_content=merged_raw, title="MOCK_PARENT", ) units_pre: list[CompositionUnit] = [merged] sections = [ _StubSection("MOCK_S1", title="title-1", raw_content="section S1 ORIGINAL text"), _StubSection("MOCK_S2", title="title-2", raw_content="section S2 ORIGINAL text"), ] lookup = _make_lookup({ "MOCK_S1": _StubV4Match("MOCK_TMPL_S1", "MOCK_FRM_S1", 1, 0.92, "use_as_is", v4_rank=1), "MOCK_S2": _StubV4Match("MOCK_TMPL_S2", "MOCK_FRM_S2", 2, 0.81, "light_edit", v4_rank=1), }) out_units, audit = resplit_all_reject_merges( units_pre, sections, lookup, _LABEL_TO_STATUS, _ALLOWED_STATUSES, v4_candidates_lookup_fn=_candidates_lookup_empty, ) assert audit["applied"] is True # Each split-produced single carries its OWN section's raw_content. by_sid = {u.source_section_ids[0]: u for u in out_units} assert by_sid["MOCK_S1"].raw_content == "section S1 ORIGINAL text" assert by_sid["MOCK_S2"].raw_content == "section S2 ORIGINAL text" # And title is forwarded from the section (not merged parent title). assert by_sid["MOCK_S1"].title == "title-1" assert by_sid["MOCK_S2"].title == "title-2" # Merged blob MUST NOT appear in any single's raw_content. for single in out_units: assert merged_raw not in single.raw_content # ─── Case 12 : selection_path telemetry tag ──────────────────────────── def test_split_singles_tagged_with_resplit_from_merge_selection_path(): """Stage 1 Q3 YES — every split-produced single has ``selection_path == "resplit_from_merge"``. Pre-hook singles that surround the merged unit keep their original ``selection_path`` (additive, non-clobbering). """ pre_single = CompositionUnit( source_section_ids=["MOCK_S0"], merge_type="single", frame_template_id="MOCK_TMPL_S0", frame_id="MOCK_FRM_S0", frame_number=0, confidence=0.95, label="use_as_is", phase_z_status=_LABEL_TO_STATUS["use_as_is"], raw_content="s0", title="t0", v4_rank=1, selection_path="rank_1", ) merged = _make_merged_unit( merge_type="parent_merged", source_section_ids=["MOCK_S1", "MOCK_S2"], label="reject", ) units_pre: list[CompositionUnit] = [pre_single, merged] sections = [ _StubSection("MOCK_S0", raw_content="s0"), _StubSection("MOCK_S1", raw_content="s1"), _StubSection("MOCK_S2", raw_content="s2"), ] lookup = _make_lookup({ "MOCK_S0": _StubV4Match("MOCK_TMPL_S0", "MOCK_FRM_S0", 0, 0.95, "use_as_is", v4_rank=1), "MOCK_S1": _StubV4Match("MOCK_TMPL_S1", "MOCK_FRM_S1", 1, 0.92, "use_as_is", v4_rank=1), "MOCK_S2": _StubV4Match("MOCK_TMPL_S2", "MOCK_FRM_S2", 2, 0.81, "light_edit", v4_rank=1), }) out_units, audit = resplit_all_reject_merges( units_pre, sections, lookup, _LABEL_TO_STATUS, _ALLOWED_STATUSES, v4_candidates_lookup_fn=_candidates_lookup_empty, ) assert audit["applied"] is True assert len(out_units) == 3 # Pre-existing single keeps its original selection_path (untouched). assert out_units[0].source_section_ids == ["MOCK_S0"] assert out_units[0].selection_path == "rank_1" # Split-produced singles get the IMP-48 telemetry tag. assert out_units[1].source_section_ids == ["MOCK_S1"] assert out_units[1].selection_path == "resplit_from_merge" assert out_units[2].source_section_ids == ["MOCK_S2"] assert out_units[2].selection_path == "resplit_from_merge" # ─── Case 13 : Normal per-section route restoration ──────────────────── def test_split_singles_route_to_normal_phase_z_status_not_fallback(): """The IMP-48 win: child sections reach the normal auto-renderable route via their OWN label → phase_z_status mapping. The merged parent's ``phase_z_status="fallback_candidate"`` (from ``label="reject"``) MUST NOT propagate to any split-produced single whose own label is not reject. Each rebuilt single's ``phase_z_status`` is set by ``v4_label_to_status.get(match.label, "unknown")`` (see ``src/phase_z2_composition.py:1126``) — the OWN label, not the parent's. """ merged = _make_merged_unit( merge_type="parent_merged", source_section_ids=["MOCK_S1", "MOCK_S2", "MOCK_S3"], label="reject", ) units_pre: list[CompositionUnit] = [merged] sections = [ _StubSection("MOCK_S1", raw_content="s1"), _StubSection("MOCK_S2", raw_content="s2"), _StubSection("MOCK_S3", raw_content="s3"), ] # Each section's OWN rank-1: 3 different non-reject labels. lookup = _make_lookup({ "MOCK_S1": _StubV4Match("MOCK_TMPL_S1", "MOCK_FRM_S1", 1, 0.92, "use_as_is", v4_rank=1), "MOCK_S2": _StubV4Match("MOCK_TMPL_S2", "MOCK_FRM_S2", 2, 0.81, "light_edit", v4_rank=1), "MOCK_S3": _StubV4Match("MOCK_TMPL_S3", "MOCK_FRM_S3", 3, 0.68, "restructure", v4_rank=1), }) out_units, audit = resplit_all_reject_merges( units_pre, sections, lookup, _LABEL_TO_STATUS, _ALLOWED_STATUSES, v4_candidates_lookup_fn=_candidates_lookup_empty, ) assert audit["applied"] is True by_sid = {u.source_section_ids[0]: u for u in out_units} # Each single's phase_z_status maps from its OWN label, not "reject". assert by_sid["MOCK_S1"].phase_z_status == "matched_zone" assert by_sid["MOCK_S2"].phase_z_status == "adapt_matched_zone" assert by_sid["MOCK_S3"].phase_z_status == "extract_matched_zone" # None of the singles inherit the merged parent's fallback_candidate # status. (Merged parent's phase_z_status was "fallback_candidate".) assert all(s.phase_z_status != "fallback_candidate" for s in out_units) # ─── Case 14 : Coverage equality (★ dropped_zero_invariant) ───────────── def test_split_preserves_full_section_coverage(): """★ Stage 1 dropped_zero_invariant — the set of section_ids covered by out_units equals the set covered by pre-hook units. Pre = 1 merged unit with 3 sections, post = 3 singles, ∀ sid preserved. """ merged = _make_merged_unit( merge_type="parent_merged", source_section_ids=["MOCK_S1", "MOCK_S2", "MOCK_S3"], label="reject", ) units_pre: list[CompositionUnit] = [merged] sections = [ _StubSection("MOCK_S1", raw_content="s1"), _StubSection("MOCK_S2", raw_content="s2"), _StubSection("MOCK_S3", raw_content="s3"), ] lookup = _make_lookup({ "MOCK_S1": _StubV4Match("MOCK_TMPL_S1", "MOCK_FRM_S1", 1, 0.92, "use_as_is", v4_rank=1), "MOCK_S2": _StubV4Match("MOCK_TMPL_S2", "MOCK_FRM_S2", 2, 0.81, "light_edit", v4_rank=1), "MOCK_S3": _StubV4Match("MOCK_TMPL_S3", "MOCK_FRM_S3", 3, 0.78, "light_edit", v4_rank=1), }) pre_sids = {sid for u in units_pre for sid in u.source_section_ids} out_units, audit = resplit_all_reject_merges( units_pre, sections, lookup, _LABEL_TO_STATUS, _ALLOWED_STATUSES, v4_candidates_lookup_fn=_candidates_lookup_empty, ) assert audit["applied"] is True post_sids = {sid for u in out_units for sid in u.source_section_ids} assert pre_sids == post_sids == {"MOCK_S1", "MOCK_S2", "MOCK_S3"} # 3 splits → 3 singles, no duplicates, no drops. assert len(out_units) == 3 assert len([sid for u in out_units for sid in u.source_section_ids]) == 3 # ─── Case 15 : layout_preset re-derivation contract (u5 input) ────────── def test_split_audit_post_split_layout_preset_matches_select_layout_preset(): """``audit["post_split_layout_preset"]`` is non-None when ``applied=True`` and reflects ``select_layout_preset(out_units)``. The pipeline's u5 re-derive block at ``phase_z2_pipeline.py:3996-4006`` reads exactly this field to decide whether to update ``layout_preset`` (when ``not layout_override_applied``). """ from src.phase_z2_composition import select_layout_preset # local import — no top-level side effects on test discovery merged = _make_merged_unit( merge_type="parent_merged", source_section_ids=["MOCK_S1", "MOCK_S2"], label="reject", ) units_pre: list[CompositionUnit] = [merged] sections = [ _StubSection("MOCK_S1", raw_content="s1"), _StubSection("MOCK_S2", raw_content="s2"), ] lookup = _make_lookup({ "MOCK_S1": _StubV4Match("MOCK_TMPL_S1", "MOCK_FRM_S1", 1, 0.92, "use_as_is", v4_rank=1), "MOCK_S2": _StubV4Match("MOCK_TMPL_S2", "MOCK_FRM_S2", 2, 0.81, "light_edit", v4_rank=1), }) out_units, audit = resplit_all_reject_merges( units_pre, sections, lookup, _LABEL_TO_STATUS, _ALLOWED_STATUSES, v4_candidates_lookup_fn=_candidates_lookup_empty, ) assert audit["applied"] is True assert audit["post_split_layout_preset"] is not None, ( "applied=True must surface a non-None post_split_layout_preset " "for the u5 pipeline re-derive block" ) # Re-derive must match what u5 would compute on the helper-returned units. assert audit["post_split_layout_preset"] == select_layout_preset(out_units) # post_split_unit_count tracks len(out_units). assert audit["post_split_unit_count"] == len(out_units) == 2 # ─── Case 16 : Step 6 artifact serialization for split-help ──────────── def test_step6_artifact_payload_reflects_per_section_singles_after_split(): """The Step 6 artifact's ``selected_units`` payload (dict-comp at ``phase_z2_pipeline.py:4031-4060``) over the post-split out_units contains per-section entries — each entry has the section's OWN V4 evidence (template_id / frame_id / frame_number / label), not the merged parent's. Locks the Step 9 / IMP-47B (#76) hand-off shape: downstream consumers see per-section units, not the merged blob. """ merged = _make_merged_unit( merge_type="parent_merged", source_section_ids=["MOCK_S1", "MOCK_S2"], label="reject", ) units_pre: list[CompositionUnit] = [merged] sections = [ _StubSection("MOCK_S1", raw_content="s1"), _StubSection("MOCK_S2", raw_content="s2"), ] lookup = _make_lookup({ "MOCK_S1": _StubV4Match("MOCK_TMPL_S1", "MOCK_FRM_S1", 7, 0.88, "use_as_is", v4_rank=1), "MOCK_S2": _StubV4Match("MOCK_TMPL_S2", "MOCK_FRM_S2", 11, 0.79, "light_edit", v4_rank=1), }) out_units, audit = resplit_all_reject_merges( units_pre, sections, lookup, _LABEL_TO_STATUS, _ALLOWED_STATUSES, v4_candidates_lookup_fn=_candidates_lookup_empty, ) assert audit["applied"] is True # Step 6 artifact payload mirror. payload = _serialize_units_like_step6_artifact(out_units) payload_json = json.dumps(payload, sort_keys=True, ensure_ascii=False) # Per-section entries reflect each section's OWN evidence. assert len(payload) == 2 by_sid = {entry["source_section_ids"][0]: entry for entry in payload} assert by_sid["MOCK_S1"]["merge_type"] == "single" assert by_sid["MOCK_S1"]["frame_template_id"] == "MOCK_TMPL_S1" assert by_sid["MOCK_S1"]["frame_id"] == "MOCK_FRM_S1" assert by_sid["MOCK_S1"]["frame_number"] == 7 assert by_sid["MOCK_S1"]["label"] == "use_as_is" assert by_sid["MOCK_S1"]["phase_z_status"] == "matched_zone" assert by_sid["MOCK_S1"]["selection_path"] == "resplit_from_merge" assert by_sid["MOCK_S2"]["merge_type"] == "single" assert by_sid["MOCK_S2"]["frame_template_id"] == "MOCK_TMPL_S2" assert by_sid["MOCK_S2"]["frame_id"] == "MOCK_FRM_S2" assert by_sid["MOCK_S2"]["frame_number"] == 11 assert by_sid["MOCK_S2"]["label"] == "light_edit" assert by_sid["MOCK_S2"]["phase_z_status"] == "adapt_matched_zone" assert by_sid["MOCK_S2"]["selection_path"] == "resplit_from_merge" # Merged parent's identifiers MUST NOT appear in the post-split payload. # ★ feedback_ai_isolation_contract — no frame swap from merged parent. assert merged.frame_template_id not in payload_json assert merged.frame_id not in payload_json # imp48_resplit audit is populated for the pipeline's artifact extension. assert audit["applied"] is True assert len(audit["split_units"]) == 1 # ─── Case 17 : Mixed pre-hook list — order preserved ─────────────────── def test_split_preserves_order_when_merged_is_sandwiched_between_singles(): """Pre = [single, merged_reject(2 sections), single]. Post should be [single, single_resplit, single_resplit, single] in source order — the split inserts in place of the merged unit, surrounding singles untouched. Total post count = 4 (within the v0 layout cap).""" pre_left = CompositionUnit( source_section_ids=["MOCK_S0"], merge_type="single", frame_template_id="MOCK_TMPL_S0", frame_id="MOCK_FRM_S0", frame_number=0, confidence=0.95, label="use_as_is", phase_z_status=_LABEL_TO_STATUS["use_as_is"], raw_content="s0", title="t0", v4_rank=1, selection_path="rank_1", ) merged = _make_merged_unit( merge_type="parent_merged", source_section_ids=["MOCK_S1", "MOCK_S2"], label="reject", ) pre_right = CompositionUnit( source_section_ids=["MOCK_S3"], merge_type="single", frame_template_id="MOCK_TMPL_S3", frame_id="MOCK_FRM_S3", frame_number=3, confidence=0.95, label="use_as_is", phase_z_status=_LABEL_TO_STATUS["use_as_is"], raw_content="s3", title="t3", v4_rank=1, selection_path="rank_1", ) units_pre: list[CompositionUnit] = [pre_left, merged, pre_right] sections = [ _StubSection("MOCK_S0", raw_content="s0"), _StubSection("MOCK_S1", raw_content="s1"), _StubSection("MOCK_S2", raw_content="s2"), _StubSection("MOCK_S3", raw_content="s3"), ] lookup = _make_lookup({ "MOCK_S0": _StubV4Match("MOCK_TMPL_S0", "MOCK_FRM_S0", 0, 0.95, "use_as_is", v4_rank=1), "MOCK_S1": _StubV4Match("MOCK_TMPL_S1", "MOCK_FRM_S1", 1, 0.92, "use_as_is", v4_rank=1), "MOCK_S2": _StubV4Match("MOCK_TMPL_S2", "MOCK_FRM_S2", 2, 0.81, "light_edit", v4_rank=1), "MOCK_S3": _StubV4Match("MOCK_TMPL_S3", "MOCK_FRM_S3", 3, 0.95, "use_as_is", v4_rank=1), }) out_units, audit = resplit_all_reject_merges( units_pre, sections, lookup, _LABEL_TO_STATUS, _ALLOWED_STATUSES, v4_candidates_lookup_fn=_candidates_lookup_empty, ) assert audit["applied"] is True # Order preserved: S0, S1 (split), S2 (split), S3. assert [u.source_section_ids for u in out_units] == [ ["MOCK_S0"], ["MOCK_S1"], ["MOCK_S2"], ["MOCK_S3"], ] # Surrounding singles untouched (identity preserved). assert out_units[0] is pre_left assert out_units[-1] is pre_right # Only the inner two are split-produced. assert out_units[1].selection_path == "resplit_from_merge" assert out_units[2].selection_path == "resplit_from_merge" # Audit post-split count matches projected 4 (within layout cap). assert audit["post_split_unit_count"] == 4 # ═══════════════════════════════════════════════════════════════════════ # u9 — Pipeline split-then-reject regression (mixed reject + non-reject # children). Scope-lock from Stage 2: coverage preserved + remaining # reject singles remain eligible for IMP-47B (#76) handoff. # # Differs from u8 (split-help): u8 covers the "all children non-reject" # case where every split-produced single reaches the normal auto-render # route. u9 covers the harder case where one or more child sections # carry their OWN rank-1 V4 reject (the section is genuinely difficult # even individually). IMP-48 must still split when ≥1 child is non- # reject (the beneficial-split threshold), preserving full coverage and # letting IMP-47B see PER-SECTION reject singles instead of one merged # blob. # # When ALL children carry own-reject V4, the merged unit is preserved # (no_beneficial_split) — existing IMP-47B-on-merged-blob behavior is # the no-op, IMP-48 does not regress it. This is the cleanest split- # then-reject contract. # # All identifiers MOCK_ prefixed (★ RULE_7_no_hardcoding). No real # catalog template_id / frame_id / MDX sample identifier leaks. # ═══════════════════════════════════════════════════════════════════════ # ─── Case 18 : Split applied with mixed reject + non-reject children ───── def test_split_applied_with_mixed_reject_and_non_reject_children(): """Merged_reject(MOCK_S1, MOCK_S2) where MOCK_S1's OWN rank-1 V4 = use_as_is (non-reject) and MOCK_S2's OWN rank-1 V4 = reject. Beneficial- split threshold (≥1 non-reject) IS met → ``audit["applied"] is True``, out = 2 singles in source order, ``non_reject_count == 1``.""" merged = _make_merged_unit( merge_type="parent_merged", source_section_ids=["MOCK_S1", "MOCK_S2"], label="reject", ) units_pre: list[CompositionUnit] = [merged] sections = [ _StubSection("MOCK_S1", title="t1", raw_content="raw S1"), _StubSection("MOCK_S2", title="t2", raw_content="raw S2"), ] # MOCK_S1 own rank-1 = use_as_is (auto-renderable). # MOCK_S2 own rank-1 = reject (section is genuinely hard even alone). lookup = _make_lookup({ "MOCK_S1": _StubV4Match("MOCK_TMPL_S1", "MOCK_FRM_S1", 1, 0.92, "use_as_is", v4_rank=1), "MOCK_S2": _StubV4Match("MOCK_TMPL_S2_REJECT", "MOCK_FRM_S2_REJECT", 2, 0.45, "reject", v4_rank=1), }) out_units, audit = resplit_all_reject_merges( units_pre, sections, lookup, _LABEL_TO_STATUS, _ALLOWED_STATUSES, v4_candidates_lookup_fn=_candidates_lookup_empty, ) # Beneficial-split threshold met by ≥1 non-reject child. assert audit["applied"] is True assert "skipped_reason" not in audit, ( "applied=True path must not carry a skipped_reason value" ) # 2 per-section singles, in source order, merged removed. assert len(out_units) == 2 assert all(u.merge_type == "single" for u in out_units) assert [u.source_section_ids for u in out_units] == [["MOCK_S1"], ["MOCK_S2"]] # Audit split entry shows mixed count: 1 non-reject, 1 reject. assert len(audit["split_units"]) == 1 assert audit["skipped_units"] == [] assert audit["split_units"][0]["non_reject_count"] == 1 assert audit["post_split_unit_count"] == 2 # Split entry's split_singles audit records each child's resolved label. by_sid = {entry["section_id"]: entry for entry in audit["split_units"][0]["split_singles"]} assert by_sid["MOCK_S1"]["label"] == "use_as_is" assert by_sid["MOCK_S2"]["label"] == "reject" # ─── Case 19 : Reject single routes to IMP-47B handoff via fallback ────── def test_reject_split_single_carries_fallback_candidate_phase_z_status(): """The split-produced single for MOCK_S2 (own rank-1 reject) carries ``label="reject"`` AND ``phase_z_status="fallback_candidate"``. The non-reject sibling MOCK_S1 routes via its OWN ``matched_zone``. The IMP-48 win: IMP-47B (#76) sees PER-SECTION reject singles instead of one merged blob containing mixed sections. IMP-47B's router reads ``phase_z_status="fallback_candidate"`` (mapped from ``label="reject"`` via ``V4_LABEL_TO_PHASE_Z_STATUS`` at ``src/phase_z2_pipeline.py:97-103``) to decide ``ai_adaptation_required`` (see ``_RECONSTRUCTION_BY_HINT`` at ``src/phase_z2_pipeline.py:582``). The handoff contract is per-unit: each reject single is an independent IMP-47B input.""" merged = _make_merged_unit( merge_type="parent_merged", source_section_ids=["MOCK_S1", "MOCK_S2"], label="reject", ) units_pre: list[CompositionUnit] = [merged] sections = [ _StubSection("MOCK_S1", raw_content="raw S1"), _StubSection("MOCK_S2", raw_content="raw S2"), ] lookup = _make_lookup({ "MOCK_S1": _StubV4Match("MOCK_TMPL_S1", "MOCK_FRM_S1", 1, 0.92, "use_as_is", v4_rank=1), "MOCK_S2": _StubV4Match("MOCK_TMPL_S2_REJECT", "MOCK_FRM_S2_REJECT", 2, 0.45, "reject", v4_rank=1), }) out_units, audit = resplit_all_reject_merges( units_pre, sections, lookup, _LABEL_TO_STATUS, _ALLOWED_STATUSES, v4_candidates_lookup_fn=_candidates_lookup_empty, ) assert audit["applied"] is True by_sid = {u.source_section_ids[0]: u for u in out_units} # Non-reject sibling routes via its OWN label, NOT fallback_candidate. assert by_sid["MOCK_S1"].label == "use_as_is" assert by_sid["MOCK_S1"].phase_z_status == "matched_zone" # Reject single carries label=reject + phase_z_status=fallback_candidate. # This is the per-section handoff signal to IMP-47B (#76). assert by_sid["MOCK_S2"].label == "reject" assert by_sid["MOCK_S2"].phase_z_status == "fallback_candidate" # ─── Case 20 : All-children-reject merge — no_beneficial_split skip ────── def test_all_children_reject_merge_keeps_merged_no_beneficial_split(): """Both MOCK_S1 and MOCK_S2 have OWN rank-1 V4 with ``label="reject"``. Beneficial-split threshold (≥1 non-reject) is NOT met → IMP-48 must NOT split. Merged unit preserved → IMP-47B (#76) sees the merged blob (existing behavior). IMP-48 is a no-op for this shape — coverage is trivially preserved because the merged unit is kept whole. Audit fingerprint: * ``audit["applied"] is False`` * ``audit["skipped_reason"] == "no_split_applied"`` * ``audit["skipped_units"][0]["reason"] == "no_beneficial_split"`` * ``audit["post_split_layout_preset"] is None`` (u5 re-derive gate short-circuits — see ``phase_z2_pipeline.py:3996``) """ merged = _make_merged_unit( merge_type="parent_merged", source_section_ids=["MOCK_S1", "MOCK_S2"], label="reject", ) units_pre: list[CompositionUnit] = [merged] sections = [ _StubSection("MOCK_S1", raw_content="raw S1"), _StubSection("MOCK_S2", raw_content="raw S2"), ] # Both children carry OWN rank-1 reject — no auto-renderable child. lookup = _make_lookup({ "MOCK_S1": _StubV4Match("MOCK_TMPL_S1_REJECT", "MOCK_FRM_S1_REJECT", 1, 0.45, "reject", v4_rank=1), "MOCK_S2": _StubV4Match("MOCK_TMPL_S2_REJECT", "MOCK_FRM_S2_REJECT", 2, 0.40, "reject", v4_rank=1), }) out_units, audit = resplit_all_reject_merges( units_pre, sections, lookup, _LABEL_TO_STATUS, _ALLOWED_STATUSES, v4_candidates_lookup_fn=_candidates_lookup_empty, ) # No-op for all-reject merge — merged unit kept, IMP-47B sees it. assert audit["applied"] is False assert audit["skipped_reason"] == "no_split_applied" assert audit["split_units"] == [] assert len(audit["skipped_units"]) == 1 assert audit["skipped_units"][0]["reason"] == "no_beneficial_split" assert audit["skipped_units"][0]["merged_source_section_ids"] == ["MOCK_S1", "MOCK_S2"] # u5 re-derive gate short-circuits because applied=False. assert audit["post_split_layout_preset"] is None # Merged unit preserved whole — existing IMP-47B-on-merged-blob behavior. assert out_units == [merged] assert out_units[0] is merged # ─── Case 21 : Coverage preserved across mixed children (3-section) ────── def test_coverage_preserved_when_split_includes_reject_child(): """★ Stage 1 dropped_zero_invariant — pre = [merged_reject(MOCK_S1, MOCK_S2, MOCK_S3)] with 2 non-reject + 1 reject child. Post = 3 singles (the reject child IS NOT dropped). Set of section_ids preserved across pre/post. """ merged = _make_merged_unit( merge_type="parent_merged", source_section_ids=["MOCK_S1", "MOCK_S2", "MOCK_S3"], label="reject", ) units_pre: list[CompositionUnit] = [merged] sections = [ _StubSection("MOCK_S1", raw_content="raw S1"), _StubSection("MOCK_S2", raw_content="raw S2"), _StubSection("MOCK_S3", raw_content="raw S3"), ] lookup = _make_lookup({ "MOCK_S1": _StubV4Match("MOCK_TMPL_S1", "MOCK_FRM_S1", 1, 0.92, "use_as_is", v4_rank=1), "MOCK_S2": _StubV4Match("MOCK_TMPL_S2", "MOCK_FRM_S2", 2, 0.81, "light_edit", v4_rank=1), "MOCK_S3": _StubV4Match("MOCK_TMPL_S3_REJECT", "MOCK_FRM_S3_REJECT", 3, 0.40, "reject", v4_rank=1), }) pre_sids = {sid for u in units_pre for sid in u.source_section_ids} out_units, audit = resplit_all_reject_merges( units_pre, sections, lookup, _LABEL_TO_STATUS, _ALLOWED_STATUSES, v4_candidates_lookup_fn=_candidates_lookup_empty, ) assert audit["applied"] is True # Reject child IS NOT dropped. All 3 sections present post-split. post_sids = {sid for u in out_units for sid in u.source_section_ids} assert pre_sids == post_sids == {"MOCK_S1", "MOCK_S2", "MOCK_S3"} assert len(out_units) == 3 # No duplicate / no drop — each section appears in exactly one single. assert len([sid for u in out_units for sid in u.source_section_ids]) == 3 # Audit: 2 non-reject + 1 reject, applied=True. assert audit["split_units"][0]["non_reject_count"] == 2 assert audit["post_split_unit_count"] == 3 # ─── Case 22 : No frame swap on reject single ──────────────────────────── def test_reject_split_single_uses_own_v4_evidence_no_frame_swap(): """★ feedback_ai_isolation_contract — the reject split-produced single's frame_template_id / frame_id / frame_number come from its OWN ``v4_lookup_fn(sid)`` (a reject-labelled V4 evidence), NOT the merged parent's reject template_id and NOT the non-reject sibling's template_id. """ merged = _make_merged_unit( merge_type="parent_merged", source_section_ids=["MOCK_S1", "MOCK_S2"], label="reject", template_id="MOCK_TMPL_PARENT_REJECT", ) units_pre: list[CompositionUnit] = [merged] sections = [ _StubSection("MOCK_S1", raw_content="raw S1"), _StubSection("MOCK_S2", raw_content="raw S2"), ] lookup = _make_lookup({ "MOCK_S1": _StubV4Match("MOCK_TMPL_S1", "MOCK_FRM_S1", 7, 0.92, "use_as_is", v4_rank=1), "MOCK_S2": _StubV4Match("MOCK_TMPL_S2_REJECT", "MOCK_FRM_S2_REJECT", 13, 0.45, "reject", v4_rank=1), }) out_units, audit = resplit_all_reject_merges( units_pre, sections, lookup, _LABEL_TO_STATUS, _ALLOWED_STATUSES, v4_candidates_lookup_fn=_candidates_lookup_empty, ) assert audit["applied"] is True by_sid = {u.source_section_ids[0]: u for u in out_units} # Reject single carries its OWN V4 reject evidence, NOT merged parent's # template_id and NOT the non-reject sibling's template_id. reject_single = by_sid["MOCK_S2"] assert reject_single.frame_template_id == "MOCK_TMPL_S2_REJECT" assert reject_single.frame_id == "MOCK_FRM_S2_REJECT" assert reject_single.frame_number == 13 # No swap from merged parent. assert reject_single.frame_template_id != merged.frame_template_id assert reject_single.frame_id != merged.frame_id assert reject_single.frame_number != merged.frame_number # No swap from non-reject sibling. non_reject_single = by_sid["MOCK_S1"] assert reject_single.frame_template_id != non_reject_single.frame_template_id assert reject_single.frame_id != non_reject_single.frame_id # ─── Case 23 : selection_path tagging covers reject singles too ────────── def test_selection_path_tag_applies_to_reject_split_singles_too(): """Stage 1 Q3 YES — every split-produced single, INCLUDING the one with own-reject label, has ``selection_path == "resplit_from_merge"``. The telemetry tag is uniform across mixed-children splits.""" merged = _make_merged_unit( merge_type="parent_merged", source_section_ids=["MOCK_S1", "MOCK_S2"], label="reject", ) units_pre: list[CompositionUnit] = [merged] sections = [ _StubSection("MOCK_S1", raw_content="raw S1"), _StubSection("MOCK_S2", raw_content="raw S2"), ] lookup = _make_lookup({ "MOCK_S1": _StubV4Match("MOCK_TMPL_S1", "MOCK_FRM_S1", 1, 0.92, "use_as_is", v4_rank=1), "MOCK_S2": _StubV4Match("MOCK_TMPL_S2_REJECT", "MOCK_FRM_S2_REJECT", 2, 0.45, "reject", v4_rank=1), }) out_units, audit = resplit_all_reject_merges( units_pre, sections, lookup, _LABEL_TO_STATUS, _ALLOWED_STATUSES, v4_candidates_lookup_fn=_candidates_lookup_empty, ) assert audit["applied"] is True # Both split-produced singles carry the IMP-48 telemetry tag — uniform. by_sid = {u.source_section_ids[0]: u for u in out_units} assert by_sid["MOCK_S1"].selection_path == "resplit_from_merge" assert by_sid["MOCK_S2"].selection_path == "resplit_from_merge" # ─── Case 24 : Raw content preservation across reject + non-reject ─────── def test_raw_content_preserved_across_reject_and_non_reject_split_singles(): """★ MDX_raw_content_invariant — both the reject single and the non- reject single carry their OWN section's raw_content (from ``sections[sid]``), NOT the merged parent's joined blob. The reject single's raw_content is the input IMP-47B (#76) AI restructure reads — per-section, not merged blob. """ merged_raw = "MERGED BLOB — joined from children, must NOT leak to singles" merged = CompositionUnit( source_section_ids=["MOCK_S1", "MOCK_S2"], merge_type="parent_merged", frame_template_id="MOCK_TMPL_PARENT_REJECT", frame_id="MOCK_FRM_PARENT_REJECT", frame_number=99, confidence=0.10, label="reject", phase_z_status=_LABEL_TO_STATUS["reject"], raw_content=merged_raw, title="MOCK_PARENT", ) units_pre: list[CompositionUnit] = [merged] sections = [ _StubSection("MOCK_S1", title="title-1", raw_content="section S1 ORIGINAL text"), _StubSection("MOCK_S2", title="title-2", raw_content="section S2 ORIGINAL text"), ] lookup = _make_lookup({ "MOCK_S1": _StubV4Match("MOCK_TMPL_S1", "MOCK_FRM_S1", 1, 0.92, "use_as_is", v4_rank=1), "MOCK_S2": _StubV4Match("MOCK_TMPL_S2_REJECT", "MOCK_FRM_S2_REJECT", 2, 0.45, "reject", v4_rank=1), }) out_units, audit = resplit_all_reject_merges( units_pre, sections, lookup, _LABEL_TO_STATUS, _ALLOWED_STATUSES, v4_candidates_lookup_fn=_candidates_lookup_empty, ) assert audit["applied"] is True by_sid = {u.source_section_ids[0]: u for u in out_units} # Non-reject single keeps its OWN section's raw_content. assert by_sid["MOCK_S1"].raw_content == "section S1 ORIGINAL text" assert by_sid["MOCK_S1"].title == "title-1" # Reject single ALSO keeps its OWN section's raw_content — per-section # input for IMP-47B (#76), NOT merged blob. assert by_sid["MOCK_S2"].raw_content == "section S2 ORIGINAL text" assert by_sid["MOCK_S2"].title == "title-2" # Merged blob MUST NOT appear in any single's raw_content. for single in out_units: assert merged_raw not in single.raw_content # ─── Case 25 : Step 6 artifact payload for split-then-reject ───────────── def test_step6_artifact_payload_shows_per_section_handoff_for_split_then_reject(): """The Step 6 artifact's ``selected_units`` payload (dict-comp at ``phase_z2_pipeline.py:4031-4060``) over the post-split out_units contains per-section entries. The reject single's payload entry has ``label="reject"`` + ``phase_z_status="fallback_candidate"`` (per- section IMP-47B handoff signal). The non-reject single's entry has its OWN ``matched_zone`` / ``adapt_matched_zone``. The merged parent's identifiers MUST NOT appear in the payload. """ merged = _make_merged_unit( merge_type="parent_merged", source_section_ids=["MOCK_S1", "MOCK_S2"], label="reject", ) units_pre: list[CompositionUnit] = [merged] sections = [ _StubSection("MOCK_S1", raw_content="raw S1"), _StubSection("MOCK_S2", raw_content="raw S2"), ] lookup = _make_lookup({ "MOCK_S1": _StubV4Match("MOCK_TMPL_S1", "MOCK_FRM_S1", 7, 0.88, "use_as_is", v4_rank=1), "MOCK_S2": _StubV4Match("MOCK_TMPL_S2_REJECT", "MOCK_FRM_S2_REJECT", 13, 0.45, "reject", v4_rank=1), }) out_units, audit = resplit_all_reject_merges( units_pre, sections, lookup, _LABEL_TO_STATUS, _ALLOWED_STATUSES, v4_candidates_lookup_fn=_candidates_lookup_empty, ) assert audit["applied"] is True # Step 6 artifact payload mirror. payload = _serialize_units_like_step6_artifact(out_units) payload_json = json.dumps(payload, sort_keys=True, ensure_ascii=False) assert len(payload) == 2 by_sid = {entry["source_section_ids"][0]: entry for entry in payload} # Non-reject single's payload — matched_zone (auto-renderable). assert by_sid["MOCK_S1"]["merge_type"] == "single" assert by_sid["MOCK_S1"]["frame_template_id"] == "MOCK_TMPL_S1" assert by_sid["MOCK_S1"]["label"] == "use_as_is" assert by_sid["MOCK_S1"]["phase_z_status"] == "matched_zone" assert by_sid["MOCK_S1"]["selection_path"] == "resplit_from_merge" # Reject single's payload — fallback_candidate (IMP-47B handoff target). assert by_sid["MOCK_S2"]["merge_type"] == "single" assert by_sid["MOCK_S2"]["frame_template_id"] == "MOCK_TMPL_S2_REJECT" assert by_sid["MOCK_S2"]["label"] == "reject" assert by_sid["MOCK_S2"]["phase_z_status"] == "fallback_candidate" assert by_sid["MOCK_S2"]["selection_path"] == "resplit_from_merge" # Merged parent's identifiers MUST NOT appear in the payload. assert merged.frame_template_id not in payload_json assert merged.frame_id not in payload_json # Audit reflects the mixed-children split. assert audit["split_units"][0]["non_reject_count"] == 1 assert audit["post_split_unit_count"] == 2