docs(#54): F-4 legacy annotation + F-5 fixture convention -- AUDIT-01 housekeeping

INTEGRATION-AUDIT-01 (#50) §10.4 / §10.5 housekeeping carry-over. F-4: annotate 14 remaining legacy Phase R'/Q sample-text hits across 10 src/ files with inline marker `# [legacy Phase R'/Q example -- INTEGRATION-AUDIT-01 §10.4]`. Comment-only. No string-literal / regex / sample dict value mutated. fit_verifier.py L612 marker keeps Phase Z partial-live import graph (FitAnalysis / RoleFit / redistribute / salvage) byte-precise. F-5: docs-only addendum -- §10.5.1 in INTEGRATION-AUDIT-01-REPORT.md + tests/CLAUDE.md fixture convention note. No root tests/fixtures/ dir created; existing tests/phase_z2/fixtures/ convention preserved. Documents test-only sample-reference allowance vs src/** runtime prohibition. Out of scope: Phase Z source 11 hits (phase_z2_content_extractor / failure_router / mapper / retry), production behavior change, #19 work. Verified: pytest -q tests/phase_z2/ = 157 PASS. git diff +210/-0 (35 src/docs lines + 175 new tests/CLAUDE.md). No behavioral delta. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-19 20:23:36 +09:00
parent 8f06a4c99f
commit 02e2ae0afb
12 changed files with 210 additions and 0 deletions
--- a/docs/architecture/INTEGRATION-AUDIT-01-REPORT.md
+++ b/docs/architecture/INTEGRATION-AUDIT-01-REPORT.md
@@ -521,6 +521,17 @@ Five candidates were produced by Axes 1-4. F-3 + F-2 + F-1 are blocking conditio
  - REPORT §8.2 fourth bullet (`tests/fixtures/` not yet established).
 - **priority / gating** : **optional, very low priority**. Filing is only justified when sample inventory grows; the current state is already aligned with the spirit of the rule.

+#### 10.5.1 F-5 docs-only resolution addendum (#54 Stage 3 u5, 2026-05-19)
+
+Per issue #54 Stage 2 plan, F-5 is closed as **docs-only**; no root `tests/fixtures/` directory is created in this work. The current fixture inventory does not justify migration, and the existing convention is sufficient. The convention is recorded here so future anti-hardcoding audits can distinguish fixture / test-only paths from production paths without re-discovering the §8 G6 PASS-WITH-NOTE baseline.
+
+- **Existing convention (DO NOT CHANGE)** : `tests/phase_z2/fixtures/` exists as a YAML regression fixture root (loaded by `tests/phase_z2/test_fixtures_loader.py`). Subdirectories present at audit time : `tests/phase_z2/fixtures/build_layout_css/`, `tests/phase_z2/fixtures/retry_gate/`. This is the canonical home for Phase Z regression fixtures.
+- **Root `tests/fixtures/` (ABSENT)** : not created in #54. If a future change requires a non-Phase-Z, non-YAML fixture corpus (for example, multi-file MDX golden inputs that grow beyond what `tests/phase_z2/test_*.py` can hold inline), the migration must be filed as its own Gitea issue with its own scope-lock per §10.5.
+- **Allowed sample references** : `samples/mdx_batch/**` and `samples/mdx/**` may be referenced from `tests/**` (test-only paths) for integration smoke -- e.g. the existing `samples/mdx_batch/02.mdx` references in `tests/phase_z2/test_pz2_vu_integration.py`. These do not violate the §8 anti-hardcoding rule because the spirit of the rule targets production pipeline code, not test runners.
+- **Forbidden sample references** : production pipeline code (`src/**` runtime path) must NOT hardcode sample-specific MDX filenames or content (e.g. `02.mdx`, `03.mdx`, frame-specific labels keyed to a sample). The 20 legacy Phase R'/Q hits annotated under F-4 (#54 Stage 3 u1-u4) are intentional documented examples in docstrings / comments / glossary regex / sample-data dicts, not runtime input pins; they are out of scope for this rule by §10.4 verdict.
+- **AI-isolation contract** : this addendum is text-only. No production behavior change, no runtime sample-path mutation, no new fixture file. Compatible with PZ-1 (AI = 0 on normal path) and [[feedback_ai_isolation_contract]].
+- **Cross-reference** : `tests/CLAUDE.md` fixture convention note (#54 Stage 3 u5) mirrors the test-only / production rule split documented here.
+
 ### 10.6 Follow-up summary

 | candidate | source axis | doc-only? | gates #19? | priority |
--- a/src/block_assembler_b2.py
+++ b/src/block_assembler_b2.py
@@ -8,6 +8,7 @@
 - 블록 CSS의 글씨 크기를 font_hierarchy에 맞게 조정 (프로세스 내 조정)
 - 콘텐츠는 PipelineContext에서 가져옴 (하드코딩 아님)
 - 블록은 콘텐츠에 맞게 재구성 (items 수 동적)
+  [legacy Phase R'/Q example — INTEGRATION-AUDIT-01 §10.4]
 """
 from __future__ import annotations

--- a/src/block_matcher_tfidf.py
+++ b/src/block_matcher_tfidf.py
@@ -107,6 +107,7 @@ class TfidfBlockMatcher:
        text = text.replace("S/W", "SW 소프트웨어")
        text = text.replace("H/W", "HW 하드웨어")
        text = re.sub(r'\bDX\b', 'DX 디지털전환', text)
+        # [legacy Phase R'/Q example — INTEGRATION-AUDIT-01 §10.4]
        text = re.sub(r'\bBIM\b', 'BIM 건설정보모델링', text)
        text = text.replace("(", " ").replace(")", " ")
        text = text.replace("[", " ").replace("]", " ")
--- a/src/block_reference.py
+++ b/src/block_reference.py
@@ -399,6 +399,7 @@ _SAMPLE_DATA: dict[str, dict[str, Any]] = {
        "center_label": "DX",
        "center_sub": "디지털 전환",
        "items": [
+            # [legacy Phase R'/Q example — INTEGRATION-AUDIT-01 §10.4]
            {"label": "BIM", "color": "#ff6b35"},
            {"label": "GIS", "color": "#00d4aa"},
            {"label": "DT", "color": "#ffd700"},
@@ -406,6 +407,7 @@ _SAMPLE_DATA: dict[str, dict[str, Any]] = {
    },
    "keyword-circle-row": {
        "keywords": [
+            # [legacy Phase R'/Q example — INTEGRATION-AUDIT-01 §10.4]
            {"letter": "B", "label": "BIM", "description": "건물정보모델링"},
            {"letter": "G", "label": "GIS", "description": "지리정보시스템"},
            {"letter": "D", "label": "DX", "description": "디지털 전환"},
@@ -432,6 +434,7 @@ _SAMPLE_DATA: dict[str, dict[str, Any]] = {
        "right_title": "개선",
        "rows": [
            {"left": "수작업", "center": "프로세스", "right": "자동화"},
+            # [legacy Phase R'/Q example — INTEGRATION-AUDIT-01 §10.4]
            {"left": "2D 도면", "center": "설계 도구", "right": "3D BIM"},
        ],
    },
--- a/src/content_editor.py
+++ b/src/content_editor.py
@@ -22,6 +22,9 @@ from src.sse_utils import stream_sse_tokens

 logger = logging.getLogger(__name__)

+# [legacy Phase R'/Q examples — INTEGRATION-AUDIT-01 §10.4]
+# (sample-text literals at L43-L44 / L67 inside the EDITOR_PROMPT string below
+#  — "건설산업 디지털화", "BIM 전면 도입", "DX와 BIM 개념" preserved verbatim)
 EDITOR_PROMPT = """당신은 도메인 전문가이자 콘텐츠 편집자이다.
 원본 콘텐츠의 핵심 내용을 유지하면서 각 블록의 슬롯에 맞게 텍스트를 정리한다.

--- a/src/design_director.py
+++ b/src/design_director.py
@@ -29,6 +29,7 @@ BLOCK_SLOTS = {
        "slot_desc": {
            "title_ko": "한글 메인 타이틀",
            "title_en": "영문 서브 타이틀 (없으면 생략)",
+            # [legacy Phase R'/Q example — INTEGRATION-AUDIT-01 §10.4]
            "breadcrumb": "상위 카테고리 경로 (예: 디지털전환 > BIM)",
            "bg_image": "배경 이미지 경로",
        },
@@ -965,6 +966,7 @@ def _validate_height_budget(blocks: list[dict], preset: dict) -> list[dict]:
    for block in blocks_to_remove:
        blocks.remove(block)

+    # [legacy Phase R'/Q example — INTEGRATION-AUDIT-01 §10.4]
    # 삭제 후 zone_blocks 재구성 (후속 pill-pair/높이 체크에 반영)
    zone_blocks.clear()
    for block in blocks:
--- a/src/design_tokens.py
+++ b/src/design_tokens.py
@@ -84,6 +84,9 @@ border-radius: 8px, padding: 14px 30px, text-align: center

 def get_layout_rules() -> str:
    """Phase S 검증 결과 기반 레이아웃 규칙."""
+    # [legacy Phase R'/Q example — INTEGRATION-AUDIT-01 §10.4]
+    # (sample-text literal "DX와 BIM의 상세 비교" at ~L109 inside the return
+    #  string below is preserved verbatim as a documented intentional example)
    return """
 ## 레이아웃 규칙 (검증 결과 기반 — 반드시 따를 것)

--- a/src/fit_verifier.py
+++ b/src/fit_verifier.py
@@ -609,6 +609,7 @@ class SupplementBlock:
    role: str
    block_id: str
    variant: str
+    # [legacy Phase R'/Q example — INTEGRATION-AUDIT-01 §10.4]
    content_source: str      # "popup:DX와 BIM의 구분" 등
    estimated_height_px: float
    available_px: float
--- a/src/frame_extractor.py
+++ b/src/frame_extractor.py
@@ -164,6 +164,7 @@ def _preprocess_text(text: str) -> str:
    text = text.replace("S/W", "SW 소프트웨어")
    text = text.replace("H/W", "HW 하드웨어")
    text = re.sub(r'\bDX\b', 'DX 디지털전환', text)
+    # [legacy Phase R'/Q example — INTEGRATION-AUDIT-01 §10.4]
    text = re.sub(r'\bBIM\b', 'BIM 건설정보모델링', text)

    # 괄호 내용 유지하되 괄호 제거
--- a/src/kei_client.py
+++ b/src/kei_client.py
@@ -53,6 +53,7 @@ KEI_PROMPT = (
    "  문장을 재작성하지 마라. 원본 문장을 그대로 가져와라.\n"
    "- **결론 텍스트도 원본 그대로.** 임의로 만들지 마라.\n"
    "- 원본에 있는 내용을 임의로 제거하거나 다른 의미로 바꾸지 마라.\n"
+    # [legacy Phase R'/Q example — INTEGRATION-AUDIT-01 §10.4]
    "- 텍스트 재구성이 허용되는 경우는 **빈 공간에 채울 요약(표, 팝업 요약)만**.\n"
    "- 각 꼭지의 source_hint에 원본의 어떤 부분이 가는지 명시.\n\n"
    "## 배치 규칙\n"
@@ -162,6 +163,7 @@ KEI_PROMPT_B = (
    "   - 원본에 이미지가 참조되면 반드시 [이미지: 제목] 마커를 포함하라.\n"
    "   - 출처가 있으면 포함하라.\n"
    "   - '활용 필요', '구체화 필요' 같은 지시사항을 쓰지 마라. 실제 콘텐츠 항목만 쓰라.\n"
+    # [legacy Phase R'/Q examples — INTEGRATION-AUDIT-01 §10.4]
    "   - 예시: '건설산업(종합산업, 기술 통합 융합), BIM(정보관리 도구, 출처: 국토교통부 2020)'\n"
    "   - 예시: '[이미지: DX와 핵심기술간 상호관계] 다이어그램, GIS 역할(공간 분석). [팝업: DX와 BIM의 구분] 비교표'\n\n"
    "## 출력 형식 (JSON만)\n"
@@ -789,6 +791,10 @@ async def call_kei_final_review(
 # I-9: Kei 넘침 판단 호출
 # ──────────────────────────────────────

+# [legacy Phase R'/Q example — INTEGRATION-AUDIT-01 §10.4]
+# (sample-text literal "Option 2 (핵심 재구성 + 팝업 분리)" inside the
+#  KEI_OVERFLOW_PROMPT triple-quoted string below is preserved verbatim
+#  as a documented intentional example of overflow-judgment output)
 KEI_OVERFLOW_PROMPT = """당신은 슬라이드 콘텐츠 전문가이다.
 디자인 팀장이 배치한 블록들이 컨테이너(zone)의 높이 예산을 초과했다.
 콘텐츠의 중요도와 전달 메시지를 기준으로 어떻게 처리할지 판단하라.
--- a/src/pipeline.py
+++ b/src/pipeline.py
@@ -1182,6 +1182,7 @@ async def generate_slide(
        yield {"event": "progress", "data": "3/7 슬라이드 HTML 생성 중..."}

        async def stage_2(context: PipelineContext) -> dict:
+            # [legacy Phase R'/Q example — INTEGRATION-AUDIT-01 §10.4]
            # Phase X-BX': Type B는 code_assembled 직접 사용, Sonnet 재구성 스킵
            if context.analysis.layout_template in ("B", "B'", "B''"):
                from src.block_assembler import assemble_slide_html_final
@@ -1190,6 +1191,7 @@ async def generate_slide(
                logger.info(f"[Stage 2] Type B: slide-base + 블록 (font_scale={fs:.1f})")
                return {"generated_html": generated}

+            # [legacy Phase R'/Q example — INTEGRATION-AUDIT-01 §10.4]
            # Type A: 기존 Sonnet 재구성 코드 그대로
            from src.content_verifier import generate_with_retry

@@ -1998,6 +2000,7 @@ async def _apply_adjustments(
                    block["detail_target"] = True
                    if "data" in block:
                        del block["data"]
+                    # [legacy Phase R'/Q example — INTEGRATION-AUDIT-01 §10.4]
                    block["reason"] = f"재구성: {detail}"
                    logger.info(
                        f"조정: {area} → kei_restructure (detail_target)"
--- a/tests/CLAUDE.md
+++ b/tests/CLAUDE.md
@@ -0,0 +1,175 @@
+# CLAUDE.md — 매칭 시스템 작업 컨텍스트
+
+이 파일은 Claude (AI) 가 `tests/` 디렉토리에서 작업할 때 참고하는 컨텍스트입니다.
+프로젝트 루트의 [../CLAUDE.md](../CLAUDE.md) 와 함께 사용.
+
+## 작업 디렉토리
+
+- 메인 작업 디렉토리: `tests/matching/`
+- 데이터 / 보고서 파일도 같은 위치
+- 실행 시 항상 `tests/matching/` 에서 (스크립트 내부 상대 경로 의존)
+
+## 시스템 개요
+
+MDX 콘텐츠 ↔ Figma Frame 32 개를 매칭하는 4 단계 파이프라인 (V1~V4).
+상세는 `README.md` / `PLAN.md` / `PROGRESS.md` 참조.
+
+## 절대 규칙
+
+### 1. 하드코딩 금지 (사용자 강조 사항)
+- 결과물을 직접 고치지 말고 **프로세스/코드를 고쳐라**
+- 임의 데이터 삽입 금지 (예: DECK 04 의 "제목·A 라벨·B 라벨·행 데이터" placeholder 사용 금지)
+- 모든 표시값은 실제 코드 결과 (yaml / 함수 출력) 에서 가져와야 함
+
+### 2. 사용자 직접 수정 보존
+- 사용자가 HTML 파일을 직접 편집한 경우 **반드시 pipeline 코드에 반영** 후 재생성
+- 코드만 고치고 재실행하면 사용자 수정이 사라짐
+- 변경 시: 사용자 수정 8 개 모두 코드에 반영 → 재실행
+
+### 3. 정직한 코드 동작 표시
+- 임원 보고용 deck 라도 **코드의 한계를 솔직히 표시**
+- 예: "MDX 자동 분석 결과 — 정책/요구사항 (사람이 보면 행렬형 비교)" 같은 표기
+- "이 축은 사실 frame 매칭에 영향 없음" 같은 ablation 결과는 임원용에는 빼지만, 내부 문서 (PROGRESS.md) 에는 명시
+
+### 4. 임원 보고용 톤
+- 영문 enum 코드 (`policy_requirements`) 직접 노출 금지 — 한글 (정책/요구사항) 우선
+- 매칭 키워드는 5~10 개 + "등 N 개" 로 축약
+- 디자인: 그라데이션 / 화려한 카드 금지. 단순 표 + 흑백 + 강조 색 1~2 가지
+- 정의 / 부연 설명 최소화 (def 는 한 줄, 길게 풀어 쓰지 말 것)
+
+## 명명 규칙
+
+### 파이프라인 스크립트
+```
+pipeline_<숫자>_<이름>.py
+```
+- 01~07: 입력 추출 + 전처리 + 키워드
+- 08: V2 (semantic) / V3 (structure r2~r5) / V4 (template_fit, r1/r2)
+- 09: V2 진단
+- 10: Holdout 라벨링 / 평가
+- 11: templates_v1 감사
+- 12: templates_v2 생성 (r1, r2, r3, final, final_r2, promote_frame13)
+- 13: meeting docs / samples
+- 14: single sample
+- 15: bm25 / idf / logistic regression 비교
+- 16: deck 페이지 생성 (DECK 1~7)
+- 17: V4 full32 (32 frame 전체 평가)
+- 18: V4 slot 축 ablation
+
+### 결과 파일
+```
+<단계>_<설명>_result.yaml
+```
+- `mdx_matching_result.yaml` — V1
+- `v2_semantic_rerank_result.yaml` — V2
+- `v3_structure_rerank_r5_result.yaml` — V3 (최종 r5)
+- `v4_full32_result.yaml` — V4 (32 frame 전체)
+- `structure_ontology_v2_final_r2.yaml` — Frame 32 DB
+
+### 보고서
+```
+DECK_<번호>_<이름>.html — 임원 보고용 A4 페이지
+ATTACH_<번호>_<이름>.html — 부속 자료
+<NAME>_REPORT.html / .md — 분석 보고서
+```
+
+## 자주 쓰는 명령어
+
+```bash
+cd tests/matching/
+
+# 매칭 시스템 전체 재실행
+python pipeline_06_2_mdx_matching.py
+python pipeline_08_v2_semantic_rerank.py
+python pipeline_08_v3_r5_structure_rerank.py
+python pipeline_17_v4_full32.py
+
+# 보고서 재생성
+python pipeline_16_deck_4pages.py    # DECK 1~7
+
+# Ablation / 검증
+python pipeline_15_logistic_regression.py
+python pipeline_18_slot_axis_ablation.py
+```
+
+## 파이프라인 핵심 가중치
+
+### V1 키워드 매칭 (Logistic Regression 학습)
+```
+matching_score = 0.414 × 핵심 + 0.320 × 세트 + 0.265 × 연관
+```
+
+### V3 구조 매칭
+```
+total = 0.40 × 레이아웃 일치 + 0.35 × 콘텐츠 성격 + 0.25 × 시각 의도
+```
+
+### V4 종합 판정
+```
+confidence = 0.25 × anchor + 0.20 × cardinality + 0.20 × relation
+           + 0.15 × slot + 0.20 × content − penalty
+
+라벨 임계값:
+  ≥ 0.90 → use_as_is (그대로 사용)
+  ≥ 0.75 → light_edit (가벼운 편집)
+  ≥ 0.60 → restructure (구조 재배치)
+  < 0.60 → reject (사용 불가)
+```
+
+## 데이터 소스
+
+| 데이터 | 위치 | 용도 |
+|---|---|---|
+| Figma 텍스트 | `figma_to_html_agent/blocks/*/texts.md` | 32 frame 텍스트 추출 |
+| BEPS 마스터 | (별도 위치) | 키워드 보강용 |
+| MDX 검증 구간 | (`pipeline_01_extract_nodes.py` 의 `MDX_SECTIONS`) | 정답 매칭 검증 |
+| Frame 이미지 | `data/figma_previews/<프레임번호>.png` | DECK 시각화 |
+
+## 테스트 픽스처 컨벤션 (F-5, INTEGRATION-AUDIT-01 §10.5.1)
+
+테스트 데이터 / 샘플 참조의 정식 위치 규약. `tests/` 안에서만 적용되고 `src/**` 프로덕션 경로에는 적용되지 않음.
+
+| 경로 | 상태 | 용도 | 비고 |
+|---|---|---|---|
+| `tests/phase_z2/fixtures/` | **존재 (정식)** | Phase Z 회귀 YAML 픽스처 | `test_fixtures_loader.py` 가 로드. 서브디렉토리 : `build_layout_css/`, `retry_gate/`. |
+| `tests/fixtures/` (루트) | **없음 (현재 미생성)** | 비-Phase-Z / 비-YAML 픽스처 미래 후보 | 샘플 인벤토리가 `tests/phase_z2/test_*.py` 인라인으로 감당 못 할 때만 별도 이슈로 신설. |
+| `samples/mdx_batch/**` , `samples/mdx/**` | 존재 | 통합 스모크 입력 | `tests/**` 에서만 참조 가능. `src/**` 런타임 경로 하드코딩 금지. |
+
+규칙 :
+
+- 테스트 코드에서는 `samples/mdx_batch/02.mdx` 같은 샘플 MDX 를 직접 참조해도 됨 (예 : `tests/phase_z2/test_pz2_vu_integration.py`). `src/**` 런타임 입력은 절대 샘플 파일명 / 콘텐츠를 핀하지 말 것.
+- 새 YAML 회귀 픽스처는 `tests/phase_z2/fixtures/` 아래 새 서브디렉토리로 추가. 루트 `tests/fixtures/` 신설은 금지 (별도 이슈 필요).
+- `src/**` 안에 등장하는 "BIM" / "건설산업 DX" / "재구성" 같은 sample-like 리터럴은 INTEGRATION-AUDIT-01 §10.4 (F-4) 에서 의도된 docstring / glossary / 예시 dict 로 분류 완료. annotation marker 가 붙어 있으면 의도된 example. 새 sample 리터럴을 `src/**` 에 도입하지 말 것.
+- 본 컨벤션의 anchor 정의는 `docs/architecture/INTEGRATION-AUDIT-01-REPORT.md` §10.5.1. 변경 시 anchor 부터 갱신.
+
+## 자주 헷갈리는 것
+
+### 영문 enum vs 한글 매핑
+- 코드 / yaml: 영문 enum (`comparative_matrix`, `cycle_interrelation`)
+- 보고서 표시: 한글 (`행렬형 비교`, `순환/상호 관계`)
+- DECK 05 의 키워드 사전 표는 양쪽 다 표시 (사용자 매칭 가능)
+
+### 항목수 vs 슬롯 후보 개수
+- **동일** — `item_count = len(slot_candidates)` (표 / subsections / bullets 어떤 형태든)
+- V4 의 cardinality 축과 slot.within 부분은 **같은 신호의 중복 가중** (ablation 으로 확인)
+
+### V3 vs V4 구조 점수
+- V3 = layout family + content_affinity + structure_intent (3 축)
+- V4 = anchor + cardinality + relation + slot + content (5 축)
+- **다른 모델**. V3 점수와 V4 confidence 는 별도 계산
+
+## 사용자가 강조한 피드백
+
+- "코드로 돌린 결과물이지 임의 데이터 아님" — 모든 표시 정직
+- "임원 보고용이야" — 부정적 부연 / 디테일 산식 빼기
+- "한가지만 해" — 한 번에 한 가지만 변경
+- "모든 변경은 pipeline 코드에 반영" — HTML 직접 수정은 일시적
+
+## 진행 중 발견된 약점
+
+`PROGRESS.md` 의 "발견된 약점" 표 참조. 8 개 모두 Phase E 작업 대상.
+
+가장 시급:
+1. **02-2.2 매칭 실패** (E.5)
+2. **MDX 분석 LLM 화** (E.1, E.2)
+3. **슬롯 의미 매핑** (E.3, E.4)