fix(IMP-08): Stage 5 R2 — aligner force-drill on sub-id override targets

Codex #1 (Stage 5) reproduced a smoke regression on the actual checkout :
when V4 carries the parent exact key (e.g., `04-2`) AND the drag/drop
override targets a sub-id (`primary=04-2-sub-1`), the aligner kept the
parent at parent granularity and emit `['04-1', '04-2']`, so the override
flag failed with `unknown section_id(s) ['04-2-sub-1']`.

Fix : `align_sections_to_v4_granularity` gains an optional
`override_target_section_ids` keyword. From each canonical
`${parent}-sub-N` target it derives the parent id and adds it to a
`force_drill_parents` set. Sections in that set are drilled into
sub-sections regardless of whether V4 carries the parent exact key.
Top-level override targets (no derived parent) do not trigger
force-drill, so backward-compat is preserved for parent-granularity
overrides.

The call site in `run_phase_z2_mvp1` collects sub-ids from
`override_section_assignments` and forwards them to the aligner.

Generalization (RULE 0) :
- Trigger is the override schema (`X-sub-N`), not a specific MDX / section /
  frame id. Applies to all 32-frame MDX uniformly.
- Decision is deterministic on the override target shape, independent of
  V4 yaml content.
- Default (no override) path is unchanged byte-for-byte.

Side fixes (forward-only RULE 1 cleanup, no history rewrite) :
- `align_sections_to_v4_granularity` docstring rewritten in English
  (overwrites the Korean docstring committed in 5191aca).
- Step 9 diagnostic comment quoted-string rewritten in English
  (overwrites `"V4 entry 없음"` committed in a422d72).

Tests : 3 new cases in `test_phase_z2_subsection_schema.py` —
`test_align_parent_v4_exact_keeps_section_when_no_override_targets_sub`
(backward-compat axis), `test_align_force_drills_when_override_targets_sub_id_with_parent_in_v4`
(blocker regression), `test_align_top_level_override_target_does_not_force_drill_other_sections`
(force-drill scope guard). Pytest scope-qualified result :
`test_phase_z2_subsection_schema.py` + `_section_assignment_override.py` +
`_v4_fallback.py` = 40 / 40 PASS.

Smoke (axis = sub-id override -> aligner -> assignment plan, both V4 yaml
shapes) :
- HEAD V4 yaml (`04-1`, `04-2.1`, `04-2.2` only) :
  `--override-section-assignment primary=04-2-sub-1` ->
  `aligned_section_ids=['04-1', '04-2-sub-1', '04-2-sub-2']`,
  `plan[0].assignment_source='cli_override'`,
  `plan[0].source_section_ids=['04-2-sub-1']`.
- V4 yaml with `04-2` exact key (Codex's stress case) : identical
  aligned output and identical assignment plan.

Downstream `composition_planner` abort
(`phase_z_status_not_allowed:extract_matched_zone`) is IMP-05 territory,
unchanged in both shapes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-05-16 02:28:46 +09:00
parent ab2764c8d0
commit 8f6cffc2a7
2 changed files with 132 additions and 26 deletions

View File

@@ -41,6 +41,7 @@ from jinja2 import Environment, FileSystemLoader, select_autoescape
from phase_z2_composition import ( from phase_z2_composition import (
LAYOUT_PRESETS, LAYOUT_PRESETS,
CompositionUnit, CompositionUnit,
derive_parent_id,
plan_composition, plan_composition,
select_display_strategy_candidates, select_display_strategy_candidates,
select_layout_candidates, select_layout_candidates,
@@ -372,47 +373,80 @@ def load_v4_result() -> dict:
return yaml.safe_load(V4_RESULT_PATH.read_text(encoding="utf-8")) return yaml.safe_load(V4_RESULT_PATH.read_text(encoding="utf-8"))
def align_sections_to_v4_granularity(sections: list[MdxSection], v4: dict) -> list[MdxSection]: def align_sections_to_v4_granularity(
"""V4 section granularity 에 맞춰 sections 조정. sections: list[MdxSection],
v4: dict,
*,
override_target_section_ids: Optional[list[str]] = None,
) -> list[MdxSection]:
"""Align MDX sections to canonical sub-section granularity.
IMP-08 B-3 : canonical sub-section id ``${section_id}-sub-${ordinal}`` Default behaviour (V4-driven granularity, backward compatible) :
(예 : ``04-2-sub-1``) 를 emit 하고, legacy V4 키 (``04-2.1``) 는 - V4 has section_id exact key -> keep section unchanged (parent
``v4_alias_keys`` 로 보존하여 ``_resolve_v4_section_key`` 가 alias 경로로 granularity rendering, parent-level V4 evidence applies).
매칭한다. canonical ordinal id 는 frontend drag/drop override 와 동일 - V4 missing + H3 sub-sections -> drill into sub-sections, emit
schema (`section_id-sub-N`). canonical ids ``${section_id}-sub-${ordinal}`` with optional
decimal alias for legacy V4 keys (e.g. ``04-2.1``).
- V4 missing + no H3 -> pass through (downstream V4 lookup
will naturally abort with no_v4_section).
N-R5 alias guard : heading_number 가 decimal (``2.1``) 일 때만 alias IMP-08 B-3 / Stage 5 R2 blocker-fix — ``override_target_section_ids``
emit. integer-only (``1``) / non-numeric heading 은 alias 0 — sibling is the list of section ids that drag/drop override CLI flags target.
parent V4 evidence 로 잘못 promote 되는 collision 방지 (RULE 0). When any override target matches ``${section_id}-sub-N`` for a section
whose parent is otherwise V4-aligned, that section is force-drilled so
sub-section ids become addressable. This keeps the default rendering
path on V4 granularity while making drag/drop deterministic regardless
of whether V4 carries a parent exact key.
각 section 에 대해 : Each drilled sub-section carries :
- V4 에 section.section_id 키 있음 → 그대로 유지 (## level 매칭) - heading_number : decimal "2.1" / integer "1" / None (bare H3 title).
- V4 에 키 없고 raw_content 에 ### sub-section 존재 → ### 로 drill - v4_alias_keys : legacy V4 keys to try when the canonical ordinal
- V4 에 키 없고 ### 도 없음 → 원본 그대로 (V4 lookup 단계에서 자연스럽게 abort) id misses. Populated only when ``heading_number`` matches the
decimal pattern ``\\d+\\.\\d+`` (N-R5 guard) — integer-only or
bare H3 produces no alias to avoid sibling-parent V4 collisions.
설계 원칙 : Design boundary :
- parser (parse_mdx) = MDX 만 앎 (V4 무관) - parser (``parse_mdx``) = MDX-only knowledge (V4-agnostic).
- aligner (이 함수) = V4 키 기준 granularity 결정 - aligner (this function) = canonical sub-id schema, MDX-driven on
- runtime parser 가 matching artifact 의 granularity 를 *따라가는* 구조 force_drill, V4-driven otherwise.
- resolver (``_resolve_v4_section_key``) = exact > alias > None,
never auto-promotes to parent/sibling (axis 7 hybrid lock).
""" """
v4_keys = set(v4.get("mdx_sections", {}).keys()) v4_keys = set(v4.get("mdx_sections", {}).keys())
# Build the set of parent ids whose sub-ids are explicitly targeted by
# an override. These sections must be drilled even if V4 also carries
# the parent key exactly. Parents derived from canonical "X-sub-N" ids
# only — non-sub ids (top-level overrides) do not trigger drilling.
force_drill_parents: set[str] = set()
if override_target_section_ids:
for sid in override_target_section_ids:
parent = derive_parent_id(sid)
if parent and sid != parent:
force_drill_parents.add(parent)
aligned: list[MdxSection] = [] aligned: list[MdxSection] = []
# IMP-08 B-3 : capture optional heading-number prefix (decimal "2.1" or # Capture optional heading-number prefix (decimal "2.1" or integer "1")
# integer "1") + heading title. None group = bare "### Title". # plus the heading title. None group = bare "### Title".
sub_pattern = re.compile( sub_pattern = re.compile(
r"^###\s+(?:(\d+(?:\.\d+)?)\s+)?(.+?)$", re.MULTILINE r"^###\s+(?:(\d+(?:\.\d+)?)\s+)?(.+?)$", re.MULTILINE
) )
decimal_re = re.compile(r"\d+\.\d+") decimal_re = re.compile(r"\d+\.\d+")
for section in sections: for section in sections:
if section.section_id in v4_keys: force_drill = section.section_id in force_drill_parents
if section.section_id in v4_keys and not force_drill:
# V4 carries this section exactly and no override targets a
# sub-id under it: keep parent granularity (backward compat).
aligned.append(section) aligned.append(section)
continue continue
sub_matches = list(sub_pattern.finditer(section.raw_content)) sub_matches = list(sub_pattern.finditer(section.raw_content))
if not sub_matches: if not sub_matches:
aligned.append(section) # drill 불가, V4 lookup 에서 abort 됨 # No H3 sub-sections: cannot drill. Pass section through;
# downstream V4 lookup aborts with no_v4_section when needed.
aligned.append(section)
continue continue
mdx_id = section.section_id.split("-")[0] # e.g., "04" mdx_id = section.section_id.split("-")[0] # e.g., "04"
@@ -2076,8 +2110,21 @@ def run_phase_z2_mvp1(
# 2. Load V4 # 2. Load V4
v4 = load_v4_result() v4 = load_v4_result()
# 3. Align sections to V4 granularity (### drill if needed) # 3. Align sections to V4 granularity (### drill if needed).
sections = align_sections_to_v4_granularity(sections, v4) # IMP-08 B-3 / Stage 5 R2 : forward override target ids so sub-id
# drag/drop targets force-drill their parent section even when V4
# carries the parent exact key (deterministic drag/drop addressing).
_override_target_sids: list[str] = []
if override_section_assignments:
for _sids in override_section_assignments.values():
for _sid in _sids:
if isinstance(_sid, str) and _sid:
_override_target_sids.append(_sid)
sections = align_sections_to_v4_granularity(
sections,
v4,
override_target_section_ids=_override_target_sids or None,
)
print(f" aligned : sections={len(sections)} ({[s.section_id for s in sections]})") print(f" aligned : sections={len(sections)} ({[s.section_id for s in sections]})")
# ─── Step 5: V4 매칭 evidence (non-reject max-6 후보 list — 사용자 lock 2026-05-08) ─── # ─── Step 5: V4 매칭 evidence (non-reject max-6 후보 list — 사용자 lock 2026-05-08) ───
@@ -2871,7 +2918,7 @@ def run_phase_z2_mvp1(
# reporting only. Runtime selection goes through _resolve_v4_section_key # reporting only. Runtime selection goes through _resolve_v4_section_key
# (4 sites). Direct dict lookup here is intentional — debug_zones carries # (4 sites). Direct dict lookup here is intentional — debug_zones carries
# dict-shape entries without v4_alias_keys plumbing, and a miss here only # dict-shape entries without v4_alias_keys plumbing, and a miss here only
# yields a "V4 entry 없음" report line (runtime impact zero). # yields a "V4 entry missing" report line (runtime impact zero).
try: try:
with open(V4_RESULT_PATH, encoding="utf-8") as _vf: with open(V4_RESULT_PATH, encoding="utf-8") as _vf:
_v4_full = yaml.safe_load(_vf) _v4_full = yaml.safe_load(_vf)

View File

@@ -111,7 +111,9 @@ def test_mdx_section_default_construction_preserves_4_positional_callers():
def test_align_passthrough_when_v4_key_exact_match(): def test_align_passthrough_when_v4_key_exact_match():
# Section already aligned to V4 key — aligner keeps it untouched. # Section already aligned to V4 key (no override target): aligner
# keeps it untouched. Parent-level V4 evidence flows via exact-match
# lookup.
sections = [_section("04-1", 1, "1. Top", "body")] sections = [_section("04-1", 1, "1. Top", "body")]
v4 = {"mdx_sections": {"04-1": {"judgments_full32": []}}} v4 = {"mdx_sections": {"04-1": {"judgments_full32": []}}}
out = align_sections_to_v4_granularity(sections, v4) out = align_sections_to_v4_granularity(sections, v4)
@@ -119,6 +121,63 @@ def test_align_passthrough_when_v4_key_exact_match():
assert out[0].section_id == "04-1" assert out[0].section_id == "04-1"
def test_align_parent_v4_exact_keeps_section_when_no_override_targets_sub():
# Backward-compat axis: when V4 carries the parent exact key and no
# drag/drop override targets a sub-id of this section, the aligner
# MUST keep the parent (preserves V4 evidence at parent granularity).
raw = "### 2.1 First\nbody1\n### 2.2 Second\nbody2\n"
sections = [_section("03-2", 2, "2. Parent", raw)]
v4 = {"mdx_sections": {"03-2": {"judgments_full32": []}}}
out = align_sections_to_v4_granularity(sections, v4)
assert [s.section_id for s in out] == ["03-2"]
def test_align_force_drills_when_override_targets_sub_id_with_parent_in_v4():
# Stage 5 R2 blocker-fix regression: when V4 has the parent exact key
# AND an override targets a sub-id of that section, the aligner MUST
# drill regardless of V4 parent presence. This makes drag/drop
# addressing deterministic across all V4 yaml shapes.
raw = "### 2.1 First\nbody1\n### 2.2 Second\nbody2\n"
sections = [_section("04-2", 2, "2. Parent", raw)]
v4 = {
"mdx_sections": {
"04-2": {"judgments_full32": []}, # parent V4 entry present
"04-2.1": {"judgments_full32": []}, # plus decimal sub entries
"04-2.2": {"judgments_full32": []},
}
}
out = align_sections_to_v4_granularity(
sections, v4, override_target_section_ids=["04-2-sub-1"]
)
# Force-drill: parent id MUST be replaced by canonical sub-ids.
assert [s.section_id for s in out] == ["04-2-sub-1", "04-2-sub-2"]
# Decimal aliases preserved (N-R5: decimal heading_number).
assert out[0].v4_alias_keys == ["04-2.1"]
assert out[1].v4_alias_keys == ["04-2.2"]
def test_align_top_level_override_target_does_not_force_drill_other_sections():
# Top-level override target ("primary=03-1") has no derive_parent_id,
# so it MUST NOT force-drill any section. Only "X-sub-N" targets
# trigger force-drill on parent X.
raw = "### 2.1 First\nbody1\n"
sections = [
_section("03-1", 1, "1. Top", "body"),
_section("03-2", 2, "2. Parent", raw),
]
v4 = {
"mdx_sections": {
"03-1": {"judgments_full32": []},
"03-2": {"judgments_full32": []},
}
}
out = align_sections_to_v4_granularity(
sections, v4, override_target_section_ids=["03-1"]
)
# No sub-id target -> both sections kept at parent granularity.
assert [s.section_id for s in out] == ["03-1", "03-2"]
def test_align_drill_emits_canonical_ordinal_id_with_decimal_alias(): def test_align_drill_emits_canonical_ordinal_id_with_decimal_alias():
# Decimal H3 headings -> canonical ordinal id + decimal alias (legacy V4 key). # Decimal H3 headings -> canonical ordinal id + decimal alias (legacy V4 key).
raw = "### 2.1 First\nbody1\n### 2.2 Second\nbody2\n" raw = "### 2.1 First\nbody1\n### 2.2 Second\nbody2\n"