Files
C.E.L_Slide_test2/src/region_marker_stamper.py
kyeongmin 5484077a53
Some checks failed
Multi-MDX Regression (IMP-91) / multi-mdx-regression (push) Failing after 21s
feat(#94): IMP-94 u1~u6 Layer A region/content marker injection (stamper + render_slide chain + 4 zones_data.append placement_markers + 35 parity tests)
u1 (src/region_marker_stamper.py): deterministic root-div stamper injecting data-region-id + data-content-unit-id onto each family-partial root div anchored by data-template-id. Idempotent (re-stamp = no-op), AI=0, additive only, empty/None markers no-op, F9/F29 frame-slot axis preserved.

u2 (src/phase_z2_pipeline.py render_slide chain): _stamp_region_markers chained after IMP-56 u9 _stamp_zone_html. Marker source = zone.get("placement_markers") or [] — Codex #16 P4b crash risk closed via the or-[] call-site fallback.

u3 (_derive_placement_markers helper): projects PlacementPlan.slot_assignments[] → list[dict] carrying region_id + content_unit_id + frame_slot_id (frame_slot_id reserved for #96 89-d). Live B4 path emits at primary zones_data.append.

u4 (3 non-live zones_data.append defaults): placement_markers: [] at IMP-30 u4 empty-shell, IMP-86 u1 adapter_needed, post-loop unrenderable plan-record paths — uniform zone shape, stamper no-op surface.

u5/u6 (tests/test_phase_z2_imp94_marker_parity.py): 33 hard tests + 2 cross-axis skip-if-anchor-absent (Emergency P4/P4b future axis). Coverage: 13 family-partial root anchors, F29 + F9 frame-slot preservation, idempotence, live render_slide stamping, P4b empty-marker no-crash, MDX 01 strip-attr parity, trace-to-DOM parity.

Disjoint from #96 (data-frame-slot-id) by attribute name. SPEC anchor: docs/architecture/PHASE-Z-CONTENT-OBJECT-SUBZONE-SPEC.md §6.4 + §7.2 (Layer A read targets + render-path activation).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-27 08:15:08 +09:00

138 lines
5.4 KiB
Python

"""IMP-94 (#94) u1 — region/content marker stamper for Phase Z final.html.
Annotates each rendered family-partial root ``<div>`` with stable
``data-region-id="..."`` and ``data-content-unit-id="..."`` attributes so
downstream Layer A telemetry (placement_trace ↔ DOM parity, Step 21 self-
report, fit_classifier read targets §6.4) can resolve a rendered zone
back to its PlacementPlan ``slot_assignments[]`` entry.
DOM contract (single point of truth — mirrored verbatim across the axis) ::
<div class="..." data-region-id="{region_id}" data-content-unit-id="{cuid}" ...
data-frame-id="..." data-template-id="...">
The anchor is the uniform root-div emitted by every Phase Z family
partial under ``templates/phase_z2/families/`` (13 partials, evidence
confirmed via ``grep -l data-template-id`` = 13/13). All 13 partials
carry the pattern::
<div class="<fNb>" data-frame-id="..." data-template-id="<family>">
The stamper finds the FIRST such opening tag with a permissive regex
and injects ``data-region-id`` + ``data-content-unit-id`` as new
attributes. Existing attributes (class, data-frame-id, data-template-id,
etc.) are preserved verbatim. The injection is idempotent — a zone that
already carries ``data-region-id`` on its root div is left alone.
Source of marker values : ``PlacementPlan.slot_assignments[].region_id``
and ``.content_unit_id`` (see ``src/phase_z2_placement_planner.py``
L253-258). u3 wires the live B4 path; u4 ensures non-live append paths
default to ``placement_markers=[]`` so this stamper safely no-ops.
Forward-compat / safety :
- Empty / None ``markers`` → passthrough (returns ``zone_html`` unchanged).
- Non-str / empty ``zone_html`` → passthrough.
- Re-stamping (idempotent) preserves the first stamp.
- Only the FIRST data-template-id root div is stamped (one per zone).
- Markers with empty / missing ``region_id`` AND ``content_unit_id`` →
passthrough (no attribute injection).
Guardrails (refs : Stage 1 binding contract, Stage 2 unit u1) :
- AI-isolation : pure deterministic Python; no LLM calls.
- Additive only : never edits / removes existing attributes.
- Idempotent : ``data-region-id`` probe short-circuits before re-inject.
- Disjoint from #96 (``data-frame-slot-id`` is a separate axis / attr).
"""
from __future__ import annotations
import re
from typing import Any, Iterable, Mapping
REGION_ID_ATTR: str = "data-region-id"
CONTENT_UNIT_ID_ATTR: str = "data-content-unit-id"
# Matches the FIRST ``<div ... data-template-id="...">`` opening tag.
# Group 1 captures the inner attribute string verbatim (incl. leading
# whitespace) so the rewriter can re-emit it unchanged after injection.
_ROOT_DIV_TAG_RE = re.compile(
r'<div\b((?=[^>]*\bdata-template-id\s*=\s*"[^"]+")[^>]*?)>',
flags=re.IGNORECASE | re.DOTALL,
)
# Probe for an existing ``data-region-id`` attribute (any value, any
# quote) so re-stamping is idempotent.
_HAS_REGION_ID_RE = re.compile(r"""\bdata-region-id\s*=""", flags=re.IGNORECASE)
def _coerce_marker_value(value: Any) -> str:
"""Return a safe attribute-value string for ``value``.
Non-str / None → ''. Strings are returned verbatim (caller responsible
for not embedding ``"`` since marker ids derive from
PlacementPlan.slot_assignments which are deterministic identifiers).
"""
if value is None:
return ""
if not isinstance(value, str):
return ""
return value
def stamp_zone_html(
zone_html: str,
markers: Iterable[Mapping[str, Any]] | None,
) -> str:
"""Stamp the root family-partial ``<div>`` with region / content-unit ids.
``markers`` is an iterable of mapping objects shaped as ::
{
"region_id": "<region_id>",
"content_unit_id": "<content_unit_id>",
# optional, ignored here — reserved for #96 (89-d):
"frame_slot_id": "<frame_slot_id>",
}
Only ``markers[0]`` is consumed (one root div per zone). Excess
markers are reserved for a future per-slot stamper (#96) and are
silently ignored by this module.
Returns ``zone_html`` unchanged when:
- ``zone_html`` is not a non-empty string,
- ``markers`` is None / empty,
- no ``data-template-id`` root div is found,
- the root div already carries ``data-region-id`` (idempotent),
- the first marker carries neither ``region_id`` nor ``content_unit_id``.
"""
if not isinstance(zone_html, str) or not zone_html:
return zone_html
if markers is None:
return zone_html
marker_list = list(markers)
if not marker_list:
return zone_html
first = marker_list[0]
if not isinstance(first, Mapping):
return zone_html
region_id = _coerce_marker_value(first.get("region_id"))
content_unit_id = _coerce_marker_value(first.get("content_unit_id"))
if not region_id and not content_unit_id:
return zone_html
stamped = {"done": False}
def _replace(match: re.Match[str]) -> str:
if stamped["done"]:
return match.group(0)
attrs = match.group(1) or ""
if _HAS_REGION_ID_RE.search(attrs):
stamped["done"] = True
return match.group(0)
stamped["done"] = True
injected = (
f' {REGION_ID_ATTR}="{region_id}"'
f' {CONTENT_UNIT_ID_ATTR}="{content_unit_id}"'
)
return f"<div{injected}{attrs}>"
return _ROOT_DIV_TAG_RE.sub(_replace, zone_html, count=1)