Phase X-B 진행중: 유형 B 조립 + 텍스트 보존 강화 + 원본 MDX 복구

X-B-3~5 완료: - space_allocator: build_containers_type_b() 추가 - assemble_stage2: _assemble_type_b() 추가 (소제목 카드형) - pipeline.py: layout_template 분기 (A/B) - pipeline_context: Analysis.layout_template 필드 - validators: 유형 B 검증 완화 텍스트 보존 강화: - KEI_PROMPT: 제목 원본 그대로, 텍스트 재작성 금지 - KEI_STRUCTURED_TEXT_PROMPT: 소제목 유지, 원본 문장 그대로 원본 MDX 복구: - samples/mdx_batch/02.mdx: 표 데이터 누락 수정 (원본에서 재복사) 미해결 (다음 세션): - 들여쓰기: 대제목→중제목→소제목→본문 계층 구조 - 이미지 캡션: [그림 제목] 형식 (대괄호 포함) - 상단 컨테이너: 빈칸 위로 붙이기 - 카드 디자인: 안전과품질/생산성향상/소통과신뢰 디자인 개선 - 제목: Kei가 원본 제목 바꾸는 문제 잔존 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-06 11:28:03 +09:00
parent bc7829b08b
commit a8fe20e08e
7 changed files with 545 additions and 32 deletions
--- a/src/kei_client.py
+++ b/src/kei_client.py
@@ -57,11 +57,16 @@ KEI_PROMPT = (
    "각 역할에 해당하는 topic_ids와 **공간 비중(weight, 합계 1.0)**을 결정하라.\n"
    "**콘텐츠에 따라 비중은 매번 달라진다. 고정값이 아니다.**\n"
    "page_structure 필드에 기록.\n\n"
-    "## 원본 텍스트 보존 원칙\n"
-    "- 원본의 논리 흐름과 정보를 빠뜨리지 마라\n"
-    "- 원본 텍스트는 최대한 보존. 약간의 편집만.\n"
-    "- 원본에 있는 내용을 임의로 제거하거나 다른 의미로 바꾸지 마라\n"
-    "- 각 꼭지의 source_hint에 원본의 어떤 부분이 가는지 명시\n\n"
+    "## 원본 텍스트 보존 원칙 (절대 규칙)\n"
+    "- **제목(##, ###)은 원본 그대로 사용하라. 절대 바꾸지 마라.**\n"
+    "  원본이 '## 1. DX의 궁극적 목표'이면 꼭지 제목도 'DX의 궁극적 목표'.\n"
+    "  임의로 '핵심 목표', '전략 방향' 등으로 바꾸지 마라.\n"
+    "- **원본 텍스트(불릿, 설명)는 85% 이상 그대로 사용하라.**\n"
+    "  문장을 재작성하지 마라. 원본 문장을 그대로 가져와라.\n"
+    "- **결론 텍스트도 원본 그대로.** 임의로 만들지 마라.\n"
+    "- 원본에 있는 내용을 임의로 제거하거나 다른 의미로 바꾸지 마라.\n"
+    "- 텍스트 재구성이 허용되는 경우는 **빈 공간에 채울 요약(표, 팝업 요약)만**.\n"
+    "- 각 꼭지의 source_hint에 원본의 어떤 부분이 가는지 명시.\n\n"
    "## 배치 규칙\n"
    "- 참조 정보(용어 정의 등)는 role: 'reference'로 표시 → 사이드바 배치\n"
    "- 본문 흐름은 role: 'flow' → 메인 영역 배치\n"
@@ -258,14 +263,20 @@ async def refine_concepts(
 KEI_STRUCTURED_TEXT_PROMPT = (
    "아래는 슬라이드 스토리라인의 꼭지 목록과 원본 콘텐츠이다.\n"
    "각 꼭지에 해당하는 원본 텍스트를 **슬라이드에 넣을 형태로 구조화**하라.\n\n"
-    "## 규칙\n"
-    "1. 원본 내용의 85% 이상을 보존하라. 축약하지 마라.\n"
-    "2. 각 문장을 불릿(•)으로 구분하라.\n"
-    "3. 하위 항목이 있으면 들여쓰기 불릿(  •)으로 구분하라.\n"
-    "4. 출처가 있으면 반드시 포함하라 (출처: ...).\n"
-    "5. 개조식 어미로 변환하라 (~있다→~있음, ~한다→~함, ~이다→삭제).\n"
-    "6. 팝업 참조([팝업: ...])는 그대로 유지하라.\n"
-    "7. 이미지 참조([이미지: ...])는 그대로 유지하라.\n\n"
+    "## 절대 규칙\n"
+    "1. **원본 문장을 그대로 가져와라. 재작성하지 마라.**\n"
+    "   원본: '시설물의 요구 성능을 설계·시공·운영 전 과정에서 디지털로 검증하여 안전성 확보'\n"
+    "   → 그대로: '• 시설물의 요구 성능을 설계·시공·운영 전 과정에서 디지털로 검증하여 안전성 확보'\n"
+    "   ❌ 재작성 금지: '디지털 검증으로 안전성을 확보함'\n"
+    "2. 원본 내용의 85% 이상을 보존하라. 축약하지 마라.\n"
+    "3. **소제목(###)이 있으면 그대로 유지하라.** 삭제하거나 합치지 마라.\n"
+    "   원본: '### 안전과 품질' → structured_text에 '안전과 품질' 소제목 유지\n"
+    "4. 각 문장을 불릿(•)으로 구분하라.\n"
+    "5. 하위 항목이 있으면 들여쓰기 불릿(  •)으로 구분하라.\n"
+    "6. 출처가 있으면 반드시 포함하라 (출처: ...).\n"
+    "7. 개조식 어미로 변환하라 (~있다→~있음, ~한다→~함, ~이다→삭제).\n"
+    "8. 팝업 참조([팝업: ...])는 그대로 유지하라.\n"
+    "9. 이미지 참조([이미지: ...])는 그대로 유지하라.\n\n"
    "## 출력 형식 (JSON만. 설명 없이.)\n"
    "```json\n"
    '{"structured_texts": ['
--- a/src/pipeline.py
+++ b/src/pipeline.py
@@ -176,6 +176,7 @@ async def generate_slide(
                core_message=analysis_raw.get("core_message", ""),
                title=analysis_raw.get("title", ""),
                total_pages=analysis_raw.get("total_pages", 1),
+                layout_template=analysis_raw.get("layout_template", "A"),
            )

            # I-6: 슬라이드 제목 ↔ 첫 꼭지 제목 중복 검증
@@ -248,6 +249,7 @@ async def generate_slide(
                [t.model_dump() for t in updated_topics],
                context.normalized.clean_text,
                raw_content=context.raw_content,
+                layout_template=context.analysis.layout_template,
            )
            if validation_errors:
                return {"_errors": validation_errors}
@@ -327,14 +329,27 @@ async def generate_slide(
                f"비율: body:sidebar={container_ratio[0]}:{container_ratio[1]}"
            )

-            # 컨테이너 스펙 계산 (기존 space_allocator 활용)
-            container_specs = calculate_container_specs(
-                page_structure=context.page_structure.roles,
-                topics=[t.model_dump() for t in context.topics],
-                preset=preset,
-                slide_width=settings.slide_width,
-                slide_height=settings.slide_height,
-            )
+            # Phase X-B: 유형에 따라 컨테이너 생성 분기
+            if context.analysis.layout_template == "B":
+                from src.space_allocator import build_containers_type_b
+                container_specs = build_containers_type_b(
+                    page_structure=context.page_structure.roles,
+                    slide_width=settings.slide_width,
+                    slide_height=settings.slide_height,
+                    image_sizes=image_sizes if isinstance(image_sizes, list) else (
+                        [{**v, "key": k} for k, v in image_sizes.items()] if image_sizes else None
+                    ),
+                )
+                logger.info(f"[X-B] 유형 B 컨테이너 생성")
+            else:
+                # 유형 A: 기존 코드 그대로
+                container_specs = calculate_container_specs(
+                    page_structure=context.page_structure.roles,
+                    topics=[t.model_dump() for t in context.topics],
+                    preset=preset,
+                    slide_width=settings.slide_width,
+                    slide_height=settings.slide_height,
+                )

            # ContainerSpec → ContainerInfo 변환
            containers = {}
--- a/src/pipeline_context.py
+++ b/src/pipeline_context.py
@@ -63,6 +63,7 @@ class Analysis(BaseModel):
    core_message: str = ""
    title: str = ""
    total_pages: int = 1
+    layout_template: str = "A"  # Phase X-B: Kei가 선택한 유형 (A 또는 B)
    image_sizes: dict[str, dict[str, Any]] = Field(default_factory=dict)
    # topics와 page_structure는 PipelineContext 최상위에 위치

--- a/src/space_allocator.py
+++ b/src/space_allocator.py
@@ -439,6 +439,143 @@ def calculate_container_specs(
    return specs


+# ══════════════════════════════════════
+# Phase X-B: 유형 B 컨테이너 생성
+# ══════════════════════════════════════
+def build_containers_type_b(
+    page_structure: dict[str, Any],
+    slide_width: int = 1280,
+    slide_height: int = 720,
+    image_sizes: list[dict] | None = None,
+) -> dict[str, ContainerSpec]:
+    """유형 B: 상단(top) + 하단 2분할(bottom_left/right) + 결론(footer).
+
+    기존 유형 A(calculate_container_specs)를 건드리지 않는 별도 함수.
+    모든 크기는 슬라이드 크기 + weight + zone에서 동적 계산. 하드코딩 없음.
+
+    Args:
+        page_structure: Kei 판단 {"핵심목표": {"zone": "top", "topic_ids": [1], "weight": 0.45}, ...}
+        slide_width: 슬라이드 너비
+        slide_height: 슬라이드 높이
+        image_sizes: 이미지 정보 (비율 계산용)
+    """
+    from src.fit_verifier import _load_design_tokens
+    tokens = _load_design_tokens()
+    pad = tokens["spacing_page"]
+    header_h = tokens.get("header_height", 66)
+    gap_block = tokens["spacing_block"]
+    gap_small = tokens["spacing_small"]
+    inner_w = slide_width - pad * 2
+
+    # 역할을 zone별로 분류
+    top_roles = []      # zone=top
+    bottom_roles = []   # zone=bottom_left, bottom_right
+    footer_role = None  # zone=footer
+
+    for role_name, info in page_structure.items():
+        if not isinstance(info, dict):
+            continue
+        zone = info.get("zone", "")
+        if zone == "top":
+            top_roles.append((role_name, info))
+        elif zone in ("bottom_left", "bottom_right"):
+            bottom_roles.append((role_name, info))
+        elif zone == "footer":
+            footer_role = (role_name, info)
+
+    # 전체 가용 높이: 슬라이드 - 패딩*2 - 헤더 - gap
+    total_available = slide_height - pad * 2 - header_h - gap_block
+
+    # footer 높이: weight 비율 (최소 보장)
+    footer_weight = footer_role[1].get("weight", 0.1) if footer_role else 0.1
+    footer_h_raw = int(total_available * footer_weight)
+    _footer_min = int(14 * tokens.get("line_height_ko", 1.7) + pad)
+    footer_h = max(_footer_min, footer_h_raw)
+
+    # 중간 영역: footer + gap 제외
+    middle_h = total_available - footer_h - gap_block
+
+    # 상단/하단 높이: weight 비율로
+    top_weight = sum(info.get("weight", 0) for _, info in top_roles)
+    bottom_weight = sum(info.get("weight", 0) for _, info in bottom_roles)
+    total_mid_weight = top_weight + bottom_weight
+    if total_mid_weight <= 0:
+        total_mid_weight = 1
+
+    top_h = int(middle_h * top_weight / total_mid_weight)
+    bottom_h = middle_h - top_h - gap_small  # gap_small: 상단-하단 사이
+
+    # 상단: 이미지가 있으면 좌텍스트+우이미지 나란히 → 폭 분할
+    img_ratio = 0
+    if image_sizes:
+        for img in image_sizes:
+            r = img.get("ratio", 0)
+            if r > 0:
+                img_ratio = r
+                break
+
+    if img_ratio > 0:
+        # 이미지 높이 = top_h, 이미지 폭 = top_h * ratio
+        img_w = min(int(top_h * img_ratio), int(inner_w * 0.45))  # 최대 45%
+        text_w = inner_w - img_w - gap_block
+    else:
+        text_w = inner_w
+        img_w = 0
+
+    specs = {}
+
+    # 상단 역할
+    for role_name, info in top_roles:
+        specs[role_name] = ContainerSpec(
+            role=role_name,
+            zone="top",
+            topic_ids=info.get("topic_ids", []),
+            weight=info.get("weight", 0),
+            height_px=top_h,
+            width_px=text_w if img_w > 0 else inner_w,  # 이미지 있으면 텍스트 폭만
+            max_height_cost=_max_allowed_height_cost(top_h),
+            block_constraints={
+                "img_width_px": img_w,
+                "img_height_px": top_h if img_w > 0 else 0,
+                "has_image": img_w > 0,
+            },
+        )
+
+    # 하단 역할: 2분할
+    bottom_col_w = (inner_w - gap_block) // 2
+    for role_name, info in bottom_roles:
+        specs[role_name] = ContainerSpec(
+            role=role_name,
+            zone=info.get("zone", "bottom_left"),
+            topic_ids=info.get("topic_ids", []),
+            weight=info.get("weight", 0),
+            height_px=bottom_h,
+            width_px=bottom_col_w,
+            max_height_cost=_max_allowed_height_cost(bottom_h),
+            block_constraints={},
+        )
+
+    # 결론
+    if footer_role:
+        rn, info = footer_role
+        specs[rn] = ContainerSpec(
+            role=rn,
+            zone="footer",
+            topic_ids=info.get("topic_ids", []),
+            weight=info.get("weight", 0),
+            height_px=footer_h,
+            width_px=inner_w,
+            max_height_cost="low",
+            block_constraints={},
+        )
+
+    logger.info(
+        f"[X-B-3] 유형 B 컨테이너: "
+        + ", ".join(f"{r}={s.height_px}px(w={s.width_px})" for r, s in specs.items())
+    )
+    return specs
+
+
 def _max_allowed_height_cost(container_height_px: int) -> str:
    """컨테이너 높이에서 허용되는 최대 height_cost.

--- a/src/validators.py
+++ b/src/validators.py
@@ -291,6 +291,7 @@ def validate_stage_1b(
    topics: list[dict[str, Any]],
    clean_text: str,
    raw_content: str = "",
+    layout_template: str = "A",
 ) -> list[dict]:
    """Stage 1B(컨셉 구체화) 결과 검증.

@@ -384,15 +385,18 @@ def validate_stage_1b(
            claimed_count = evidence.get(relation_type, 0)

            if claimed_count == 0:
-                # 주장한 관계의 증거가 0개
-                alternatives = [(k, v) for k, v in evidence.items() if v >= 2]
-                alt_str = ", ".join(f"{k}({v}개)" for k, v in alternatives[:3])
-                errors.append({
-                    "severity": "RETRYABLE",
-                    "field": f"topics[{tid}].relation_type",
-                    "localization": f"topic {tid}: '{relation_type}' 증거 0개",
-                    "evidence": f"원본에서 '{relation_type}' 패턴 없음. 대안: {alt_str}" if alt_str else f"원본에서 '{relation_type}' 패턴 없음",
-                    "instruction": f"원본 텍스트에 '{relation_type}' 관계를 나타내는 표현이 없음. 재판단하라.",
-                })
+                if layout_template == "B":
+                    # 유형 B: relation_type 증거 부족은 warning만 (역할 구조가 자유)
+                    logger.warning(f"[Stage 1B] topic {tid}: '{relation_type}' 증거 0개 — 유형 B warning")
+                else:
+                    alternatives = [(k, v) for k, v in evidence.items() if v >= 2]
+                    alt_str = ", ".join(f"{k}({v}개)" for k, v in alternatives[:3])
+                    errors.append({
+                        "severity": "RETRYABLE",
+                        "field": f"topics[{tid}].relation_type",
+                        "localization": f"topic {tid}: '{relation_type}' 증거 0개",
+                        "evidence": f"원본에서 '{relation_type}' 패턴 없음. 대안: {alt_str}" if alt_str else f"원본에서 '{relation_type}' 패턴 없음",
+                        "instruction": f"원본 텍스트에 '{relation_type}' 관계를 나타내는 표현이 없음. 재판단하라.",
+                    })

    return errors