docs + V4 catalog + samples + Phase Q legacy 보존

전체 26 files (20 추가 + 6 수정), 10507 insertions.

Phase Z 문서 :
- docs/architecture/PHASE-Z-CHANGE-LOG.md (신설) — axis-by-axis 의사결정 history
  (newest-on-top). Step 7-A 부터 6 entry 박힘 + 2026-05-08 / 2026-05-08 #2
  (compat 매트릭스 폐기 / 6-B 폐기 / F14 표현 정정 / label gate policy 분리).
- docs/architecture/PHASE-Z-PIPELINE-OVERVIEW.md (수정) — Step 5/6/9 Gap note
  append (구조 무변, append-only). 6-B 폐기 사실 + Refinement F.
- docs/architecture/PHASE-Z-PIPELINE-STATUS-BOARD.md (수정) — snapshot date
  2026-05-08 갱신. §3 핵심 missing item 5 (Step 5/6/9 boundary axis breakdown
  + 폐기 기록). §6 한 줄 갱신 — 다음 axis 후보 A~F.

Project root docs :
- PLAN.md / PROGRESS.md / README.md (수정) — 토큰 체계 / 폴더 구조 / 설계 문서 /
  역할 분리 반영.
- IMPROVEMENT-REDESIGN.md (신설) — Phase Z 설계 핵심 문서.
- PROCESS_OVERVIEW.html (신설) — 파이프라인 개요 시각.
- docs/tasks/* (신설) — Phase Z task 문서.

V4 catalog (Phase Z runtime 필수 의존성) :
- tests/matching/v4_full32_result.yaml (신설, 4888 줄) — V4 매칭 결과 32 frame
  × 10 MDX section. lookup_v4_match() / lookup_v4_candidates() 가 본 파일 read.
  Phase Z runtime 이 *없으면 즉시 abort* — clone 후 즉시 동작 가능 보장.

Samples :
- samples/mdx_batch/04.mdx (신설) — MDX04 기본 sample.
- samples/mdx/04. DX 지연 요인.mdx (신설) — MDX04 원본.

Phase Q legacy 보존 (별 axis "Phase Q audit & salvage" 영역) :
- src/block_matcher_tfidf.py / catalog_blocks.py / frame_extractor.py /
  pipeline_v2.py — Phase Q (옛 파이프라인) src 신규 untracked 파일들.
  Phase Z runtime 와 의존성 0. Phase Q audit axis 에서 검토 예정.
- scripts/eval_block_matcher.py / fetch_all_frame_screenshots.py /
  match_17_units_my_matcher.py / match_mdx_strict.py / match_mdx_to_frames_tfidf.py /
  ocr_augment_texts.py / run_pipeline_v2.py / previews/ — Phase Q 작업 시
  사용한 옛 script. 같이 보존.
- run_mdx03_pipeline.py (수정) — Phase Q 진입점 (no flag) + Phase Z 진입점
  (--phase-z2 flag) 동시 wrapper. Phase Z 만 사용 시 `python -m
  src.phase_z2_pipeline samples/mdx_batch/03.mdx <run_id>` 직접 호출.

비-scope :
- tests/matching/ (v4_full32_result.yaml 외 ~63MB) — V4 진화 history /
  reports / DECK / ATTACH. Phase Q audit axis 에서 검토.
- tests/pipeline/ (~15MB) — pipeline data. Phase Q audit 영역.
- templates/catalog/blocks.yaml — 옛 block catalog. Phase Q audit.
- templates/phase_z2/frames/ — 옛 frame partial 위치. Phase Q audit.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-05-08 09:47:58 +09:00
parent ec83405770
commit 85c680f02a
26 changed files with 10507 additions and 46 deletions

View File

@@ -0,0 +1,404 @@
"""MDX04 partial preview — diagnostic only, NOT a Phase Z final.
목적: F16 (`bim_issues_quadrant_four`) 가 04-2.1 / 04-2.2 의 4 항목 구조와
시각적으로 정합하는지 사용자가 눈으로 확인.
방식:
- V4 runtime 우회 (정식 Phase Z 아님)
- F16 figma 원본 HTML 을 iframe 으로 임베드 (디자인 형태 그대로)
- 04-2.1 / 04-2.2 의 MDX 4 항목을 옆에 시각화 (구조 비교)
- 04-1 = frame library gap (5-card 구조, 매칭 frame 부재) placeholder
- diagnostic banner + V4 metadata + debug.json
출력:
data/runs/mdx04_partial_preview/index.html
data/runs/mdx04_partial_preview/debug.json
data/runs/mdx04_partial_preview/f16_original/ (figma 원본 + assets)
"""
import json
import re
import sys
from datetime import datetime
from html import escape
from pathlib import Path
import yaml
ROOT = Path(__file__).resolve().parents[2]
MDX_PATH = ROOT / "samples" / "mdx_batch" / "04.mdx"
V4_RESULT = ROOT / "tests" / "matching" / "v4_full32_result.yaml"
RUN_DIR = ROOT / "data" / "runs" / "mdx04_partial_preview"
# ─── MDX 04 의 04-2.1 / 04-2.2 섹션 추출 (### bullet) ──────────────
RE_SUBSECTION_HEAD = re.compile(r'^###\s+(\d+\.\d+)\s+(.+)$', re.MULTILINE)
RE_TOP_BULLET = re.compile(r'^-\s+\*\*([^*]+)\*\*\s*$')
def extract_subsection(text, num_label):
"""### {num_label} ... 부터 다음 ### 또는 --- 직전까지 추출."""
lines = text.split('\n')
start = None
for i, ln in enumerate(lines):
m = RE_SUBSECTION_HEAD.match(ln.strip())
if m and m.group(1) == num_label:
start = i
break
if start is None:
return None, []
end = len(lines)
for j in range(start + 1, len(lines)):
s = lines[j].strip()
if RE_SUBSECTION_HEAD.match(s) or s == '---':
end = j
break
section_title = lines[start].lstrip('# ').strip()
body_lines = lines[start + 1:end]
# 4 항목 추출 (top bullet + nested bullets)
items = []
cur = None
for ln in body_lines:
stripped = ln.strip()
m = RE_TOP_BULLET.match(stripped)
if m:
if cur is not None:
items.append(cur)
cur = {'headline': m.group(1).strip(), 'subs': []}
continue
m2 = re.match(r'^-\s+(.+)$', stripped)
if m2 and cur is not None and not stripped.startswith('- **'):
cur['subs'].append(m2.group(1).strip())
if cur is not None:
items.append(cur)
return section_title, items
def extract_section_04_1(text):
"""04-1 = ## 1. DX에 대한 인식. <h3> 카드 5 개 + 각 카드 안 인용 + bullet 3 개."""
lines = text.split('\n')
start = None
for i, ln in enumerate(lines):
if ln.strip() == '## 1. DX에 대한 인식':
start = i
break
if start is None:
return None, []
end = len(lines)
for j in range(start + 1, len(lines)):
s = lines[j].strip()
if s.startswith('## ') and s != '## 1. DX에 대한 인식':
end = j
break
body = '\n'.join(lines[start:end])
# <h3> 라벨 + 다음 <p> 인용 + <ul><li> bullet 3 개
cards = []
for m in re.finditer(r'<h3[^>]*>([^<]+)</h3>', body):
cards.append({'label': m.group(1).strip()})
return lines[start].lstrip('# ').strip(), cards
# ─── V4 metadata lookup ──────────────────────────────────────────
def get_f16_judgment(v4, section_id):
sec = v4['mdx_sections'].get(section_id)
if not sec:
return None
for e in sec['judgments_full32']:
if e['frame_number'] == 16:
return e
return None
def get_top1(v4, section_id):
sec = v4['mdx_sections'].get(section_id)
if not sec:
return None
j = sec.get('judgments_full32', [])
return j[0] if j else None
# ─── HTML 렌더링 ─────────────────────────────────────────────────
def render_items_html(items):
parts = ['<div class="items-list">']
for i, it in enumerate(items, 1):
parts.append('<div class="item">')
parts.append(f'<div class="item-headline">{i}. {escape(it["headline"])}</div>')
if it['subs']:
parts.append('<ul class="item-subs">')
for s in it['subs']:
parts.append(f'<li>{escape(s)}</li>')
parts.append('</ul>')
parts.append('</div>')
parts.append('</div>')
return '\n'.join(parts)
def render_cards_html(cards):
parts = ['<div class="cards-list">']
for i, c in enumerate(cards, 1):
parts.append(f'<div class="card">{i}. {escape(c["label"])}</div>')
parts.append('</div>')
return '\n'.join(parts)
def render_v4_metadata_html(j, label_note=''):
if j is None:
return '<div class="v4-meta v4-meta-missing">V4 entry not found</div>'
axes = j.get('axes', {})
return f'''<div class="v4-meta">
<div class="v4-meta-row">
<span class="v4-meta-key">V4 rank:</span><span class="v4-meta-val">{j["v4_full_rank"]}</span>
<span class="v4-meta-key">conf:</span><span class="v4-meta-val">{j["confidence"]:.4f}</span>
<span class="v4-meta-key">label:</span><span class="v4-meta-val v4-label-{j["label"]}">{j["label"]}</span>
</div>
<div class="v4-meta-row v4-axes">
<span class="v4-meta-key">axes:</span>
anchor={axes.get("anchor", 0):.2f} ·
cardinality={axes.get("cardinality", 0):.2f} ·
relation={axes.get("relation", 0):.2f} ·
slot={axes.get("slot", 0):.2f} ·
content={axes.get("content", 0):.4f}
</div>
{f'<div class="v4-meta-row v4-note">{label_note}</div>' if label_note else ''}
</div>'''
def render_section(section_id, mdx_title, items_html, j_f16, top1, note):
"""좌: F16 figma 원본 iframe / 우: MDX 텍스트 4 항목 / 하: V4 metadata."""
label_note = note
return f'''<section class="preview-section" id="sec-{section_id}">
<header class="section-head">
<h2>{escape(section_id)} · {escape(mdx_title)}</h2>
<div class="section-sub">
F16 (bim_issues_quadrant_four) candidate · top1 = F{top1["frame_number"]} ({top1["label"]}, conf {top1["confidence"]:.4f})
</div>
</header>
<div class="section-body">
<div class="col col-figma">
<div class="col-label">F16 figma 원본 (디자인 형태)</div>
<div class="iframe-frame">
<iframe src="f16_original/index.html" frameborder="0" scrolling="no"></iframe>
</div>
</div>
<div class="col col-mdx">
<div class="col-label">MDX 04 {escape(section_id)} 본문 (4 항목)</div>
{items_html}
</div>
</div>
{render_v4_metadata_html(j_f16, label_note)}
</section>'''
def render_04_1_placeholder(top1):
return f'''<section class="preview-section preview-gap" id="sec-04-1">
<header class="section-head">
<h2>04-1 · DX에 대한 인식</h2>
<div class="section-sub">
Frame library gap — 5-card 구조, 32 frame DB 에 cardinality.ideal=5 frame 부재 (이번 preview 제외)
</div>
</header>
<div class="gap-note">
<strong>왜 제외</strong>: 04-1 은 5 개 카드 (기술/효과/인력/경제/실무) — h3_cards=5 인식까지는 정상. 다만 32 frame
중 5-card 대응 frame 이 없어 V4 multi-constraint 통과 가능 frame 자체가 없음 (사용 가능 0/32, 모두 reject).
이건 detect bug 가 아니라 <strong>frame library readiness 문제</strong>.
<br><br>
V4 top1 = F{top1["frame_number"]} (conf {top1["confidence"]:.4f}, {top1["label"]}) — F16 도 rank 15, conf 0.361, reject.
</div>
</section>'''
# ─── 메인 ────────────────────────────────────────────────────────
def main():
if not V4_RESULT.exists():
print(f"ERROR: V4 result not found at {V4_RESULT}", file=sys.stderr)
sys.exit(1)
if not MDX_PATH.exists():
print(f"ERROR: MDX 04 not found at {MDX_PATH}", file=sys.stderr)
sys.exit(1)
mdx_text = MDX_PATH.read_text(encoding='utf-8')
v4 = yaml.safe_load(V4_RESULT.read_text(encoding='utf-8'))
# 04-2.1
title_2_1, items_2_1 = extract_subsection(mdx_text, '2.1')
j16_2_1 = get_f16_judgment(v4, '04-2.1')
top1_2_1 = get_top1(v4, '04-2.1')
# 04-2.2
title_2_2, items_2_2 = extract_subsection(mdx_text, '2.2')
j16_2_2 = get_f16_judgment(v4, '04-2.2')
top1_2_2 = get_top1(v4, '04-2.2')
# 04-1
title_1, cards_1 = extract_section_04_1(mdx_text)
top1_1 = get_top1(v4, '04-1')
# HTML 조립
section_2_1_html = render_section(
'04-2.1', title_2_1,
render_items_html(items_2_1),
j16_2_1, top1_2_1,
note='F16 candidate — V4 label=reject (anchor=0). 의미 매칭 회복했으나 anchor terms 부재로 multi-constraint 탈락. preview 목적 = F16 디자인 / 04-2.1 본문 정합성 시각 확인.'
)
section_2_2_html = render_section(
'04-2.2', title_2_2,
render_items_html(items_2_2),
j16_2_2, top1_2_2,
note='F16 restructure — V4 label=restructure 통과 (사용 가능 라벨). preview 목적 = F16 디자인이 04-2.2 본문에 시각적으로 fit 한지 확인.'
)
section_1_html = render_04_1_placeholder(top1_1)
timestamp = datetime.now().isoformat(timespec='seconds')
page_html = f'''<!DOCTYPE html>
<html lang="ko">
<head>
<meta charset="UTF-8">
<title>MDX04 Partial Preview · diagnostic only</title>
<style>
* {{ margin: 0; padding: 0; box-sizing: border-box; }}
body {{
font-family: -apple-system, "Pretendard", "Apple SD Gothic Neo", sans-serif;
background: #f5f6f8; color: #111; line-height: 1.5;
padding: 24px;
}}
.banner {{
background: #fff7ed; border: 2px solid #f59e0b; border-radius: 8px;
padding: 16px 20px; margin: 0 auto 24px; max-width: 1400px;
}}
.banner h1 {{ font-size: 18px; color: #92400e; margin-bottom: 6px; }}
.banner p {{ font-size: 13px; color: #78350f; }}
.banner .timestamp {{ font-size: 11px; color: #b45309; margin-top: 8px; font-family: monospace; }}
.preview-section {{
max-width: 1400px; margin: 0 auto 32px;
background: #fff; border: 1px solid #d1d5db; border-radius: 8px;
overflow: hidden;
}}
.section-head {{ padding: 16px 20px; border-bottom: 1px solid #e5e7eb; background: #f9fafb; }}
.section-head h2 {{ font-size: 18px; color: #111827; margin-bottom: 4px; }}
.section-sub {{ font-size: 13px; color: #6b7280; }}
.section-body {{
display: grid; grid-template-columns: 1fr 1fr; gap: 0;
border-bottom: 1px solid #e5e7eb;
}}
.col {{ padding: 16px 20px; }}
.col-figma {{ border-right: 1px solid #e5e7eb; background: #fafafa; }}
.col-label {{
font-size: 12px; color: #6b7280; text-transform: uppercase; letter-spacing: 0.5px;
margin-bottom: 12px; font-weight: 600;
}}
.iframe-frame {{
width: 100%; height: 380px;
background: #fff; border: 1px solid #d1d5db; border-radius: 4px;
overflow: hidden; position: relative;
}}
.iframe-frame iframe {{
width: 1280px; height: 720px;
transform: scale(0.5); transform-origin: top left;
}}
.items-list {{ display: flex; flex-direction: column; gap: 14px; }}
.item {{ padding: 12px 14px; background: #f3f4f6; border-left: 3px solid #2563eb; border-radius: 4px; }}
.item-headline {{ font-weight: 700; color: #111; font-size: 14px; margin-bottom: 6px; }}
.item-subs {{ list-style: disc; padding-left: 20px; font-size: 13px; color: #374151; }}
.item-subs li {{ margin-bottom: 3px; }}
.cards-list {{ display: grid; grid-template-columns: 1fr 1fr; gap: 8px; }}
.card {{ padding: 10px 12px; background: #f3f4f6; border-radius: 4px; font-size: 13px; }}
.v4-meta {{ padding: 12px 20px; background: #f9fafb; font-family: monospace; font-size: 12px; color: #374151; }}
.v4-meta-row {{ margin-bottom: 4px; }}
.v4-meta-key {{ color: #6b7280; margin-right: 4px; }}
.v4-meta-val {{ color: #111; margin-right: 12px; font-weight: 600; }}
.v4-axes {{ font-size: 11px; color: #6b7280; }}
.v4-note {{ font-size: 12px; color: #6b7280; margin-top: 6px; line-height: 1.5; font-family: inherit; }}
.v4-label-use_as_is {{ color: #059669; }}
.v4-label-light_edit {{ color: #2563eb; }}
.v4-label-restructure {{ color: #d97706; }}
.v4-label-reject {{ color: #dc2626; }}
.preview-gap {{ background: #fef2f2; }}
.preview-gap .section-head {{ background: #fee2e2; border-bottom-color: #fecaca; }}
.gap-note {{ padding: 16px 20px; font-size: 13px; color: #7f1d1d; }}
</style>
</head>
<body>
<div class="banner">
<h1>MDX04 Partial Preview · diagnostic only</h1>
<p>이 출력은 정식 Phase Z final 이 아닙니다. F16 (`bim_issues_quadrant_four`) 가 04-2.1 / 04-2.2 의 4 항목 구조와
시각적으로 정합하는지 확인하기 위한 진단용 preview 입니다. V4 runtime / mapper / partial 우회.
04-1 은 frame library gap 으로 이번 preview 제외.</p>
<div class="timestamp">generated: {timestamp}</div>
</div>
{section_2_1_html}
{section_2_2_html}
{section_1_html}
</body>
</html>'''
out_html = RUN_DIR / "index.html"
out_html.write_text(page_html, encoding='utf-8')
debug = {
'kind': 'mdx04_partial_preview',
'is_phase_z_final': False,
'is_diagnostic': True,
'purpose': 'F16 디자인 / 04-2.* 4 항목 구조 시각 정합성 확인',
'generated_at': timestamp,
'v4_source': str(V4_RESULT.relative_to(ROOT)),
'mdx_source': str(MDX_PATH.relative_to(ROOT)),
'sections': {
'04-2.1': {
'mdx_title': title_2_1,
'item_count': len(items_2_1),
'top1': top1_2_1,
'f16_judgment': j16_2_1,
'preview_label': 'F16 candidate (V4 label = reject, conf 0.648, anchor=0)',
},
'04-2.2': {
'mdx_title': title_2_2,
'item_count': len(items_2_2),
'top1': top1_2_2,
'f16_judgment': j16_2_2,
'preview_label': 'F16 restructure (V4 label = restructure, 사용 가능 통과)',
},
'04-1': {
'mdx_title': title_1,
'card_count': len(cards_1),
'top1': top1_1,
'preview_label': 'EXCLUDED — frame library gap (5-card structure, no matching frame in 32 DB)',
},
},
'caveats': [
'정식 Phase Z final 아님 — V4 runtime / mapper / partial 모두 우회',
'F16 figma 원본 HTML 을 그대로 임베드 — 디자인 형태만 시각화 (텍스트 슬롯 매핑 X)',
'04-2.1 의 F16 V4 label = reject (anchor=0) — 의미 매칭 회복했으나 anchor terms 부재',
'04-2.2 의 F16 V4 label = restructure — 사용 가능 라벨, 단 정식 partial 미작성',
'04-1 = frame library readiness 문제 (detect bug 아님)',
],
}
out_debug = RUN_DIR / "debug.json"
out_debug.write_text(json.dumps(debug, ensure_ascii=False, indent=2), encoding='utf-8')
print(f"[mdx04_partial_preview] generated:")
print(f" html : {out_html}")
print(f" debug : {out_debug}")
print(f" figma : {RUN_DIR / 'f16_original' / 'index.html'}")
if __name__ == "__main__":
main()