docs(#54): F-4 legacy annotation + F-5 fixture convention -- AUDIT-01 housekeeping

INTEGRATION-AUDIT-01 (#50) §10.4 / §10.5 housekeeping carry-over. F-4: annotate 14 remaining legacy Phase R'/Q sample-text hits across 10 src/ files with inline marker `# [legacy Phase R'/Q example -- INTEGRATION-AUDIT-01 §10.4]`. Comment-only. No string-literal / regex / sample dict value mutated. fit_verifier.py L612 marker keeps Phase Z partial-live import graph (FitAnalysis / RoleFit / redistribute / salvage) byte-precise. F-5: docs-only addendum -- §10.5.1 in INTEGRATION-AUDIT-01-REPORT.md + tests/CLAUDE.md fixture convention note. No root tests/fixtures/ dir created; existing tests/phase_z2/fixtures/ convention preserved. Documents test-only sample-reference allowance vs src/** runtime prohibition. Out of scope: Phase Z source 11 hits (phase_z2_content_extractor / failure_router / mapper / retry), production behavior change, #19 work. Verified: pytest -q tests/phase_z2/ = 157 PASS. git diff +210/-0 (35 src/docs lines + 175 new tests/CLAUDE.md). No behavioral delta. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-19 20:23:36 +09:00
parent 8f06a4c99f
commit 02e2ae0afb
12 changed files with 210 additions and 0 deletions
--- a/docs/architecture/INTEGRATION-AUDIT-01-REPORT.md
+++ b/docs/architecture/INTEGRATION-AUDIT-01-REPORT.md
@@ -521,6 +521,17 @@ Five candidates were produced by Axes 1-4. F-3 + F-2 + F-1 are blocking conditio
  - REPORT §8.2 fourth bullet (`tests/fixtures/` not yet established).
 - **priority / gating** : **optional, very low priority**. Filing is only justified when sample inventory grows; the current state is already aligned with the spirit of the rule.
 #### 10.5.1 F-5 docs-only resolution addendum (#54 Stage 3 u5, 2026-05-19)
 Per issue #54 Stage 2 plan, F-5 is closed as **docs-only**; no root `tests/fixtures/` directory is created in this work. The current fixture inventory does not justify migration, and the existing convention is sufficient. The convention is recorded here so future anti-hardcoding audits can distinguish fixture / test-only paths from production paths without re-discovering the §8 G6 PASS-WITH-NOTE baseline.
 - **Existing convention (DO NOT CHANGE)** : `tests/phase_z2/fixtures/` exists as a YAML regression fixture root (loaded by `tests/phase_z2/test_fixtures_loader.py`). Subdirectories present at audit time : `tests/phase_z2/fixtures/build_layout_css/`, `tests/phase_z2/fixtures/retry_gate/`. This is the canonical home for Phase Z regression fixtures.
 - **Root `tests/fixtures/` (ABSENT)** : not created in #54. If a future change requires a non-Phase-Z, non-YAML fixture corpus (for example, multi-file MDX golden inputs that grow beyond what `tests/phase_z2/test_*.py` can hold inline), the migration must be filed as its own Gitea issue with its own scope-lock per §10.5.
 - **Allowed sample references** : `samples/mdx_batch/**` and `samples/mdx/**` may be referenced from `tests/**` (test-only paths) for integration smoke -- e.g. the existing `samples/mdx_batch/02.mdx` references in `tests/phase_z2/test_pz2_vu_integration.py`. These do not violate the §8 anti-hardcoding rule because the spirit of the rule targets production pipeline code, not test runners.
 - **Forbidden sample references** : production pipeline code (`src/**` runtime path) must NOT hardcode sample-specific MDX filenames or content (e.g. `02.mdx`, `03.mdx`, frame-specific labels keyed to a sample). The 20 legacy Phase R'/Q hits annotated under F-4 (#54 Stage 3 u1-u4) are intentional documented examples in docstrings / comments / glossary regex / sample-data dicts, not runtime input pins; they are out of scope for this rule by §10.4 verdict.
 - **AI-isolation contract** : this addendum is text-only. No production behavior change, no runtime sample-path mutation, no new fixture file. Compatible with PZ-1 (AI = 0 on normal path) and [[feedback_ai_isolation_contract]].
 - **Cross-reference** : `tests/CLAUDE.md` fixture convention note (#54 Stage 3 u5) mirrors the test-only / production rule split documented here.
 ### 10.6 Follow-up summary
 | candidate | source axis | doc-only? | gates #19? | priority |
--- a/src/block_assembler_b2.py
+++ b/src/block_assembler_b2.py
@@ -8,6 +8,7 @@
 - 블록 CSS의 글씨 크기를 font_hierarchy에 맞게 조정 (프로세스 내 조정)
 - 콘텐츠는 PipelineContext에서 가져옴 (하드코딩 아님)
 - 블록은 콘텐츠에 맞게 재구성 (items 수 동적)
  [legacy Phase R'/Q example — INTEGRATION-AUDIT-01 §10.4]
 """
 from __future__ import annotations
--- a/src/block_matcher_tfidf.py
+++ b/src/block_matcher_tfidf.py
@@ -107,6 +107,7 @@ class TfidfBlockMatcher:
        text = text.replace("S/W", "SW 소프트웨어")
        text = text.replace("H/W", "HW 하드웨어")
        text = re.sub(r'\bDX\b', 'DX 디지털전환', text)
        # [legacy Phase R'/Q example — INTEGRATION-AUDIT-01 §10.4]
        text = re.sub(r'\bBIM\b', 'BIM 건설정보모델링', text)
        text = text.replace("(", " ").replace(")", " ")
        text = text.replace("[", " ").replace("]", " ")
--- a/src/block_reference.py
+++ b/src/block_reference.py
@@ -399,6 +399,7 @@ _SAMPLE_DATA: dict[str, dict[str, Any]] = {
        "center_label": "DX",
        "center_sub": "디지털 전환",
        "items": [
            # [legacy Phase R'/Q example — INTEGRATION-AUDIT-01 §10.4]
            {"label": "BIM", "color": "#ff6b35"},
            {"label": "GIS", "color": "#00d4aa"},
            {"label": "DT", "color": "#ffd700"},
@@ -406,6 +407,7 @@ _SAMPLE_DATA: dict[str, dict[str, Any]] = {
    },
    "keyword-circle-row": {
        "keywords": [
            # [legacy Phase R'/Q example — INTEGRATION-AUDIT-01 §10.4]
            {"letter": "B", "label": "BIM", "description": "건물정보모델링"},
            {"letter": "G", "label": "GIS", "description": "지리정보시스템"},
            {"letter": "D", "label": "DX", "description": "디지털 전환"},
@@ -432,6 +434,7 @@ _SAMPLE_DATA: dict[str, dict[str, Any]] = {
        "right_title": "개선",
        "rows": [
            {"left": "수작업", "center": "프로세스", "right": "자동화"},
            # [legacy Phase R'/Q example — INTEGRATION-AUDIT-01 §10.4]
            {"left": "2D 도면", "center": "설계 도구", "right": "3D BIM"},
        ],
    },
--- a/src/content_editor.py
+++ b/src/content_editor.py
@@ -22,6 +22,9 @@ from src.sse_utils import stream_sse_tokens
 logger = logging.getLogger(__name__)
 # [legacy Phase R'/Q examples — INTEGRATION-AUDIT-01 §10.4]
 # (sample-text literals at L43-L44 / L67 inside the EDITOR_PROMPT string below
 #  — "건설산업 디지털화", "BIM 전면 도입", "DX와 BIM 개념" preserved verbatim)
 EDITOR_PROMPT = """당신은 도메인 전문가이자 콘텐츠 편집자이다.
 원본 콘텐츠의 핵심 내용을 유지하면서 각 블록의 슬롯에 맞게 텍스트를 정리한다.
--- a/src/design_director.py
+++ b/src/design_director.py
@@ -29,6 +29,7 @@ BLOCK_SLOTS = {
        "slot_desc": {
            "title_ko": "한글 메인 타이틀",
            "title_en": "영문 서브 타이틀 (없으면 생략)",
            # [legacy Phase R'/Q example — INTEGRATION-AUDIT-01 §10.4]
            "breadcrumb": "상위 카테고리 경로 (예: 디지털전환 > BIM)",
            "bg_image": "배경 이미지 경로",
        },
@@ -965,6 +966,7 @@ def _validate_height_budget(blocks: list[dict], preset: dict) -> list[dict]:
    for block in blocks_to_remove:
        blocks.remove(block)
    # [legacy Phase R'/Q example — INTEGRATION-AUDIT-01 §10.4]
    # 삭제 후 zone_blocks 재구성 (후속 pill-pair/높이 체크에 반영)
    zone_blocks.clear()
    for block in blocks:
--- a/src/design_tokens.py
+++ b/src/design_tokens.py
@@ -84,6 +84,9 @@ border-radius: 8px, padding: 14px 30px, text-align: center
 def get_layout_rules() -> str:
    """Phase S 검증 결과 기반 레이아웃 규칙."""
    # [legacy Phase R'/Q example — INTEGRATION-AUDIT-01 §10.4]
    # (sample-text literal "DX와 BIM의 상세 비교" at ~L109 inside the return
    #  string below is preserved verbatim as a documented intentional example)
    return """
 ## 레이아웃 규칙 (검증 결과 기반 — 반드시 따를 것)
--- a/src/fit_verifier.py
+++ b/src/fit_verifier.py
@@ -609,6 +609,7 @@ class SupplementBlock:
    role: str
    block_id: str
    variant: str
    # [legacy Phase R'/Q example — INTEGRATION-AUDIT-01 §10.4]
    content_source: str      # "popup:DX와 BIM의 구분" 등
    estimated_height_px: float
    available_px: float
--- a/src/frame_extractor.py
+++ b/src/frame_extractor.py
@@ -164,6 +164,7 @@ def _preprocess_text(text: str) -> str:
    text = text.replace("S/W", "SW 소프트웨어")
    text = text.replace("H/W", "HW 하드웨어")
    text = re.sub(r'\bDX\b', 'DX 디지털전환', text)
    # [legacy Phase R'/Q example — INTEGRATION-AUDIT-01 §10.4]
    text = re.sub(r'\bBIM\b', 'BIM 건설정보모델링', text)
    # 괄호 내용 유지하되 괄호 제거
--- a/src/kei_client.py
+++ b/src/kei_client.py
@@ -53,6 +53,7 @@ KEI_PROMPT = (
    "  문장을 재작성하지 마라. 원본 문장을 그대로 가져와라.\n"
    "- **결론 텍스트도 원본 그대로.** 임의로 만들지 마라.\n"
    "- 원본에 있는 내용을 임의로 제거하거나 다른 의미로 바꾸지 마라.\n"
    # [legacy Phase R'/Q example — INTEGRATION-AUDIT-01 §10.4]
    "- 텍스트 재구성이 허용되는 경우는 **빈 공간에 채울 요약(표, 팝업 요약)만**.\n"
    "- 각 꼭지의 source_hint에 원본의 어떤 부분이 가는지 명시.\n\n"
    "## 배치 규칙\n"
@@ -162,6 +163,7 @@ KEI_PROMPT_B = (
    "   - 원본에 이미지가 참조되면 반드시 [이미지: 제목] 마커를 포함하라.\n"
    "   - 출처가 있으면 포함하라.\n"
    "   - '활용 필요', '구체화 필요' 같은 지시사항을 쓰지 마라. 실제 콘텐츠 항목만 쓰라.\n"
    # [legacy Phase R'/Q examples — INTEGRATION-AUDIT-01 §10.4]
    "   - 예시: '건설산업(종합산업, 기술 통합 융합), BIM(정보관리 도구, 출처: 국토교통부 2020)'\n"
    "   - 예시: '[이미지: DX와 핵심기술간 상호관계] 다이어그램, GIS 역할(공간 분석). [팝업: DX와 BIM의 구분] 비교표'\n\n"
    "## 출력 형식 (JSON만)\n"
@@ -789,6 +791,10 @@ async def call_kei_final_review(
 # I-9: Kei 넘침 판단 호출
 # ──────────────────────────────────────
 # [legacy Phase R'/Q example — INTEGRATION-AUDIT-01 §10.4]
 # (sample-text literal "Option 2 (핵심 재구성 + 팝업 분리)" inside the
 #  KEI_OVERFLOW_PROMPT triple-quoted string below is preserved verbatim
 #  as a documented intentional example of overflow-judgment output)
 KEI_OVERFLOW_PROMPT = """당신은 슬라이드 콘텐츠 전문가이다.
 디자인 팀장이 배치한 블록들이 컨테이너(zone)의 높이 예산을 초과했다.
 콘텐츠의 중요도와 전달 메시지를 기준으로 어떻게 처리할지 판단하라.
--- a/src/pipeline.py
+++ b/src/pipeline.py
@@ -1182,6 +1182,7 @@ async def generate_slide(
        yield {"event": "progress", "data": "3/7 슬라이드 HTML 생성 중..."}
        async def stage_2(context: PipelineContext) -> dict:
            # [legacy Phase R'/Q example — INTEGRATION-AUDIT-01 §10.4]
            # Phase X-BX': Type B는 code_assembled 직접 사용, Sonnet 재구성 스킵
            if context.analysis.layout_template in ("B", "B'", "B''"):
                from src.block_assembler import assemble_slide_html_final
@@ -1190,6 +1191,7 @@ async def generate_slide(
                logger.info(f"[Stage 2] Type B: slide-base + 블록 (font_scale={fs:.1f})")
                return {"generated_html": generated}
            # [legacy Phase R'/Q example — INTEGRATION-AUDIT-01 §10.4]
            # Type A: 기존 Sonnet 재구성 코드 그대로
            from src.content_verifier import generate_with_retry
@@ -1998,6 +2000,7 @@ async def _apply_adjustments(
                    block["detail_target"] = True
                    if "data" in block:
                        del block["data"]
                    # [legacy Phase R'/Q example — INTEGRATION-AUDIT-01 §10.4]
                    block["reason"] = f"재구성: {detail}"
                    logger.info(
                        f"조정: {area} → kei_restructure (detail_target)"
--- a/tests/CLAUDE.md
+++ b/tests/CLAUDE.md
@@ -0,0 +1,175 @@
 # CLAUDE.md — 매칭 시스템 작업 컨텍스트
 이 파일은 Claude (AI) 가 `tests/` 디렉토리에서 작업할 때 참고하는 컨텍스트입니다.
 프로젝트 루트의 [../CLAUDE.md](../CLAUDE.md) 와 함께 사용.
 ## 작업 디렉토리
 - 메인 작업 디렉토리: `tests/matching/`
 - 데이터 / 보고서 파일도 같은 위치
 - 실행 시 항상 `tests/matching/` 에서 (스크립트 내부 상대 경로 의존)
 ## 시스템 개요
 MDX 콘텐츠 ↔ Figma Frame 32 개를 매칭하는 4 단계 파이프라인 (V1~V4).
 상세는 `README.md` / `PLAN.md` / `PROGRESS.md` 참조.
 ## 절대 규칙
 ### 1. 하드코딩 금지 (사용자 강조 사항)
 - 결과물을 직접 고치지 말고 **프로세스/코드를 고쳐라**
 - 임의 데이터 삽입 금지 (예: DECK 04 의 "제목·A 라벨·B 라벨·행 데이터" placeholder 사용 금지)
 - 모든 표시값은 실제 코드 결과 (yaml / 함수 출력) 에서 가져와야 함
 ### 2. 사용자 직접 수정 보존
 - 사용자가 HTML 파일을 직접 편집한 경우 **반드시 pipeline 코드에 반영** 후 재생성
 - 코드만 고치고 재실행하면 사용자 수정이 사라짐
 - 변경 시: 사용자 수정 8 개 모두 코드에 반영 → 재실행
 ### 3. 정직한 코드 동작 표시
 - 임원 보고용 deck 라도 **코드의 한계를 솔직히 표시**
 - 예: "MDX 자동 분석 결과 — 정책/요구사항 (사람이 보면 행렬형 비교)" 같은 표기
 - "이 축은 사실 frame 매칭에 영향 없음" 같은 ablation 결과는 임원용에는 빼지만, 내부 문서 (PROGRESS.md) 에는 명시
 ### 4. 임원 보고용 톤
 - 영문 enum 코드 (`policy_requirements`) 직접 노출 금지 — 한글 (정책/요구사항) 우선
 - 매칭 키워드는 5~10 개 + "등 N 개" 로 축약
 - 디자인: 그라데이션 / 화려한 카드 금지. 단순 표 + 흑백 + 강조 색 1~2 가지
 - 정의 / 부연 설명 최소화 (def 는 한 줄, 길게 풀어 쓰지 말 것)
 ## 명명 규칙
 ### 파이프라인 스크립트
 ```
 pipeline_<숫자>_<이름>.py
 ```
 - 01~07: 입력 추출 + 전처리 + 키워드
 - 08: V2 (semantic) / V3 (structure r2~r5) / V4 (template_fit, r1/r2)
 - 09: V2 진단
 - 10: Holdout 라벨링 / 평가
 - 11: templates_v1 감사
 - 12: templates_v2 생성 (r1, r2, r3, final, final_r2, promote_frame13)
 - 13: meeting docs / samples
 - 14: single sample
 - 15: bm25 / idf / logistic regression 비교
 - 16: deck 페이지 생성 (DECK 1~7)
 - 17: V4 full32 (32 frame 전체 평가)
 - 18: V4 slot 축 ablation
 ### 결과 파일
 ```
 <단계>_<설명>_result.yaml
 ```
 - `mdx_matching_result.yaml` — V1
 - `v2_semantic_rerank_result.yaml` — V2
 - `v3_structure_rerank_r5_result.yaml` — V3 (최종 r5)
 - `v4_full32_result.yaml` — V4 (32 frame 전체)
 - `structure_ontology_v2_final_r2.yaml` — Frame 32 DB
 ### 보고서
 ```
 DECK_<번호>_<이름>.html — 임원 보고용 A4 페이지
 ATTACH_<번호>_<이름>.html — 부속 자료
 <NAME>_REPORT.html / .md — 분석 보고서
 ```
 ## 자주 쓰는 명령어
 ```bash
 cd tests/matching/
 # 매칭 시스템 전체 재실행
 python pipeline_06_2_mdx_matching.py
 python pipeline_08_v2_semantic_rerank.py
 python pipeline_08_v3_r5_structure_rerank.py
 python pipeline_17_v4_full32.py
 # 보고서 재생성
 python pipeline_16_deck_4pages.py    # DECK 1~7
 # Ablation / 검증
 python pipeline_15_logistic_regression.py
 python pipeline_18_slot_axis_ablation.py
 ```
 ## 파이프라인 핵심 가중치
 ### V1 키워드 매칭 (Logistic Regression 학습)
 ```
 matching_score = 0.414 × 핵심 + 0.320 × 세트 + 0.265 × 연관
 ```
 ### V3 구조 매칭
 ```
 total = 0.40 × 레이아웃 일치 + 0.35 × 콘텐츠 성격 + 0.25 × 시각 의도
 ```
 ### V4 종합 판정
 ```
 confidence = 0.25 × anchor + 0.20 × cardinality + 0.20 × relation
           + 0.15 × slot + 0.20 × content − penalty
 라벨 임계값:
  ≥ 0.90 → use_as_is (그대로 사용)
  ≥ 0.75 → light_edit (가벼운 편집)
  ≥ 0.60 → restructure (구조 재배치)
  < 0.60 → reject (사용 불가)
 ```
 ## 데이터 소스
 | 데이터 | 위치 | 용도 |
 |---|---|---|
 | Figma 텍스트 | `figma_to_html_agent/blocks/*/texts.md` | 32 frame 텍스트 추출 |
 | BEPS 마스터 | (별도 위치) | 키워드 보강용 |
 | MDX 검증 구간 | (`pipeline_01_extract_nodes.py` 의 `MDX_SECTIONS`) | 정답 매칭 검증 |
 | Frame 이미지 | `data/figma_previews/<프레임번호>.png` | DECK 시각화 |
 ## 테스트 픽스처 컨벤션 (F-5, INTEGRATION-AUDIT-01 §10.5.1)
 테스트 데이터 / 샘플 참조의 정식 위치 규약. `tests/` 안에서만 적용되고 `src/**` 프로덕션 경로에는 적용되지 않음.
 | 경로 | 상태 | 용도 | 비고 |
 |---|---|---|---|
 | `tests/phase_z2/fixtures/` | **존재 (정식)** | Phase Z 회귀 YAML 픽스처 | `test_fixtures_loader.py` 가 로드. 서브디렉토리 : `build_layout_css/`, `retry_gate/`. |
 | `tests/fixtures/` (루트) | **없음 (현재 미생성)** | 비-Phase-Z / 비-YAML 픽스처 미래 후보 | 샘플 인벤토리가 `tests/phase_z2/test_*.py` 인라인으로 감당 못 할 때만 별도 이슈로 신설. |
 | `samples/mdx_batch/**` , `samples/mdx/**` | 존재 | 통합 스모크 입력 | `tests/**` 에서만 참조 가능. `src/**` 런타임 경로 하드코딩 금지. |
 규칙 :
 - 테스트 코드에서는 `samples/mdx_batch/02.mdx` 같은 샘플 MDX 를 직접 참조해도 됨 (예 : `tests/phase_z2/test_pz2_vu_integration.py`). `src/**` 런타임 입력은 절대 샘플 파일명 / 콘텐츠를 핀하지 말 것.
 - 새 YAML 회귀 픽스처는 `tests/phase_z2/fixtures/` 아래 새 서브디렉토리로 추가. 루트 `tests/fixtures/` 신설은 금지 (별도 이슈 필요).
 - `src/**` 안에 등장하는 "BIM" / "건설산업 DX" / "재구성" 같은 sample-like 리터럴은 INTEGRATION-AUDIT-01 §10.4 (F-4) 에서 의도된 docstring / glossary / 예시 dict 로 분류 완료. annotation marker 가 붙어 있으면 의도된 example. 새 sample 리터럴을 `src/**` 에 도입하지 말 것.
 - 본 컨벤션의 anchor 정의는 `docs/architecture/INTEGRATION-AUDIT-01-REPORT.md` §10.5.1. 변경 시 anchor 부터 갱신.
 ## 자주 헷갈리는 것
 ### 영문 enum vs 한글 매핑
 - 코드 / yaml: 영문 enum (`comparative_matrix`, `cycle_interrelation`)
 - 보고서 표시: 한글 (`행렬형 비교`, `순환/상호 관계`)
 - DECK 05 의 키워드 사전 표는 양쪽 다 표시 (사용자 매칭 가능)
 ### 항목수 vs 슬롯 후보 개수
 - **동일** — `item_count = len(slot_candidates)` (표 / subsections / bullets 어떤 형태든)
 - V4 의 cardinality 축과 slot.within 부분은 **같은 신호의 중복 가중** (ablation 으로 확인)
 ### V3 vs V4 구조 점수
 - V3 = layout family + content_affinity + structure_intent (3 축)
 - V4 = anchor + cardinality + relation + slot + content (5 축)
 - **다른 모델**. V3 점수와 V4 confidence 는 별도 계산
 ## 사용자가 강조한 피드백
 - "코드로 돌린 결과물이지 임의 데이터 아님" — 모든 표시 정직
 - "임원 보고용이야" — 부정적 부연 / 디테일 산식 빼기
 - "한가지만 해" — 한 번에 한 가지만 변경
 - "모든 변경은 pipeline 코드에 반영" — HTML 직접 수정은 일시적
 ## 진행 중 발견된 약점
 `PROGRESS.md` 의 "발견된 약점" 표 참조. 8 개 모두 Phase E 작업 대상.
 가장 시급:
 1. **02-2.2 매칭 실패** (E.5)
 2. **MDX 분석 LLM 화** (E.1, E.2)
 3. **슬롯 의미 매핑** (E.3, E.4)