Files

kyeongmin 688ddbbb17 04. design_agent 추가 — 콘텐츠 시각 구조화 슬라이드 생성기

5단계 AI 파이프라인:
1. Kei 실장(Opus via Kei API) — 꼭지 추출 + 정보 구조 파악
2. 디자인 팀장 — FAISS 블록 검색 + Opus 추천 + Sonnet 블록 매핑
3. Kei 편집자(Kei API) — 도메인 전문 텍스트 정리
4. 디자인 실무자(Sonnet + Jinja2) — CSS 변수 조정 + HTML 조립
5. 디자인 팀장(Sonnet) — 균형 재검토 (최대 2회 루프)

블록 라이브러리 46개 (6 카테고리) + _legacy 13개
FAISS 블록 검색 (bge-m3, 1024차원)
SVG N개 동적 배치 (cos/sin 좌표 계산)
Pillow 이미지 크기 측정 + base64 인라인
컨테이너 예산 기반 블록 배치 (zone별 높이 px)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

2026-03-25 18:47:13 +09:00

26 KiB

Raw Permalink Blame History

Design Agent - Technology Research Report

Date: 2026-03-24

1. CSS Grid for Slide Layouts

1.1 Fixed-Viewport Approach (16:9, 1280x720)

Recommended technique: Fixed container + CSS transform scaling.

The slide container should be authored at a fixed "normal" size (1280x720), then scaled to fit any viewport using transform: scale(). This is the same approach used by reveal.js, the dominant HTML presentation framework.

.slide {
    width: 1280px;
    height: 720px;
    aspect-ratio: 16 / 9;
    overflow: hidden;
    position: relative;
}

For preview/embedding, wrap in a container that calculates a scale factor:

.slide-wrapper {
    width: 100%;
    max-width: 1280px;
    aspect-ratio: 16 / 9;
    overflow: hidden;
}
.slide-wrapper > .slide {
    transform-origin: top left;
    transform: scale(var(--slide-scale, 1));
}

The scale factor can be computed with minimal JS: containerWidth / 1280.

Key insight from reveal.js: All presentations have a "normal" size at which they are authored. The framework automatically scales uniformly to fit different resolutions without changing aspect ratio or layout. Default is 960x700; for our use case, 1280x720 is the standard 16:9 HD dimension.

Why this works for Design Agent: The renderer produces HTML at exactly 1280x720. It never needs to be "responsive" -- it's a fixed-format document like a PDF page. Scaling is only for preview purposes.

1.2 Grid-Template-Areas for Block Combinations

grid-template-areas provides named regions that map directly to the block composition concept in the CLAUDE.md:

.layout-quote-compare-cards-conclusion {
    display: grid;
    grid-template-areas:
        "quote   quote"
        "compare cards"
        "diagram diagram"
        "conclusion conclusion";
    grid-template-columns: 1fr 1fr;
    grid-template-rows: auto 1fr auto auto;
    gap: var(--spacing-block);
    padding: var(--spacing-page);
}

Best practice: Define each layout as a separate CSS class with its own grid-template-areas. The Sonnet agent selects which layout class to apply based on the block combination it decides. This keeps the renderer deterministic -- it just applies the class.

Reusability: CSS variables allow the same grid template to adapt:

Column count: grid-template-columns: repeat(var(--cols, 3), 1fr)
Gap: gap: var(--spacing-block)
Row sizing: grid-template-rows can mix auto (content-sized) and 1fr (fill remaining)

1.3 Design Tokens

Naming convention: --{category}-{property}-{variant}

The CLAUDE.md already defines a good token set. The industry standard approach (from EightShapes, Nord Design System) uses kebab-case with semantic naming:

--color-primary, --color-accent, --color-neutral
--font-title, --font-subtitle, --font-body, --font-caption
--spacing-page, --spacing-block, --spacing-inner
--radius, --border-width, --accent-border

This matches what's already in the project's CLAUDE.md. No changes needed.

For slide-specific tokens, add:

--slide-width: 1280px;
--slide-height: 720px;
--slide-aspect: 16 / 9;

1.4 Overflow Handling in Fixed Pages

Three techniques for ensuring content fits within fixed dimensions:

Single-line truncation:

.truncate {
    overflow: hidden;
    white-space: nowrap;
    text-overflow: ellipsis;
}

Multi-line truncation (line clamping):

.line-clamp-3 {
    display: -webkit-box;
    -webkit-line-clamp: 3;
    -webkit-box-orient: vertical;
    overflow: hidden;
}

Container overflow hidden (safety net):

.block { overflow: hidden; }
.slide { overflow: hidden; }

Korean-specific consideration: word-break: keep-all affects how text wraps, which impacts line count. Content fitting calculations must account for this. The Sonnet agent should be instructed with character limits per slot, not word limits.

2. Content-to-Layout Classification

2.1 How to Prompt an LLM for Reliable Classification

Claude Structured Output (recommended):

Anthropic launched Structured Outputs in November 2025, supporting Claude Sonnet 4.5 and Opus 4.1. This guarantees JSON schema conformance at the token generation level -- the model literally cannot produce tokens that violate the schema.

Implementation:

from pydantic import BaseModel, Field
from anthropic import Anthropic

class ContentBlock(BaseModel):
    content_type: str = Field(description="One of: comparison, process, relationship, big-number, definition, list, timeline, emphasis, problem")
    evidence: str = Field(description="Which text patterns led to this classification")
    suggested_block: str = Field(description="Block type from the template library")

class ContentAnalysis(BaseModel):
    blocks: list[ContentBlock]
    layout_direction: str = Field(description="How blocks should be arranged on the page")
    primary_message: str = Field(description="The single key takeaway for this slide")

Reliability strategies:

Set temperature to 0.0-0.1 for classification tasks (reduces format drift)
Use the output_format parameter with JSON schema (not just prompting)
Include one perfect example in the system prompt
Add explicit validation instructions

2.2 Information Type Taxonomy for Presentations

Based on research from SlideSpeak (16 layout types), PPTAgent (EMNLP 2025), Beautiful.ai (300 templates), and Dr. Andrew Abela's Chart Chooser:

The CLAUDE.md already defines 9 excellent content types. Here is how they map to industry precedent:

CLAUDE.md Type	SlideSpeak Equivalent	PPTAgent Category	Common Slide Type
comparison	SS_ITEMS_*_A/B	Content slide (multi-column)	Comparison slide
process	SS_STEPS_3/4/5	Content slide (sequential)	Process/workflow slide
relationship	(custom)	Content slide (diagram)	Venn/tree diagram slide
big-number	SS_BIGNUMBER_1/3	Content slide (metric)	KPI/statistics slide
definition (card-grid)	SS_ITEMS_3/4/5/6	Content slide (grid)	Definition/feature slide
list	SS_CONTENT	Content slide (list)	Bullet point slide
timeline	(custom)	Content slide (sequential)	Timeline slide
emphasis (quote-block)	(custom)	Structural slide	Quote/callout slide
problem	(custom)	Structural slide	Problem statement slide

Additional types to consider (from industry):

SWOT (SlideSpeak has SS_SWOT) -- 4-quadrant grid
Matrix/Table (already covered by comparison-table)
Cover/Title -- for when the content is just a single title/subtitle

2.3 Structured Output Schema for Layout Decisions

The PPTAgent paper (EMNLP 2025) uses a two-stage approach that aligns perfectly with the Design Agent architecture:

Stage 1 (Opus): Analyze content, classify into functional types, extract content schemas
Stage 2 (Sonnet): Select reference layouts, fill content into slots, apply editing actions

PPTAgent represents all parsed outputs in JSON format for LLM compatibility. The Design Agent should do the same.

3. Slot-Based Template Systems

3.1 SlideSpeak's Named Slot System

SlideSpeak uses a comprehensive naming convention for template placeholders:

Layout names: SS_COVER, SS_CONTENT, SS_TABLE_OF_CONTENT, SS_BIGNUMBER_1_A, SS_BIGNUMBER_3_A, SS_ITEMS_3_A through SS_ITEMS_6_B, SS_STEPS_3 through SS_STEPS_5_ICONS, SS_SWOT

Universal slot names:

SS_TITLE -- slide title
SS_SUBTITLE -- subtitle
SS_LOGO -- logo placeholder
SS_IMAGE -- general image
SS_PAGE -- page number
SS_PRESENTATION_TITLE -- footer title

Multi-item slot naming pattern:

SS_ITEM_{N}_TITLE -- title for item N
SS_ITEM_{N}_CONTENT -- content for item N
SS_ITEM_{N}_NUMBER -- number for item N (big-number layouts)
SS_ICON_{N} -- icon for item N

Key insight for Design Agent: The {{SLOT_NAME}} convention in CLAUDE.md maps well. Adopt a similar systematic naming: {{BLOCK_TITLE}}, {{ITEM_1_TITLE}}, {{ITEM_1_CONTENT}}, etc.

3.2 Jinja2 for Template Rendering

Jinja2 is the recommended engine. It integrates natively with FastAPI and Python.

Block inheritance for base layout:

{# base_slide.html #}
<!DOCTYPE html>
<html lang="ko">
<head>
    <link rel="stylesheet" href="design-tokens.css">
    <style>{% block extra_style %}{% endblock %}</style>
</head>
<body>
    <div class="slide">
        {% block content %}{% endblock %}
    </div>
</body>
</html>

Block-level templates:

{# blocks/comparison.html #}
<div class="block-comparison">
    <div class="col-left">
        <h3>{{ left_title }}</h3>
        <p>{{ left_content }}</p>
    </div>
    <div class="col-right">
        <h3>{{ right_title }}</h3>
        <p>{{ right_content }}</p>
    </div>
</div>

Composition via includes:

{# Generated by renderer based on Sonnet's layout decision #}
{% extends "base_slide.html" %}
{% block content %}
    {% include "blocks/quote-block.html" %}
    <div class="grid-row-2">
        {% include "blocks/comparison.html" %}
        {% include "blocks/card-grid.html" %}
    </div>
    {% include "blocks/conclusion-bar.html" %}
{% endblock %}

3.3 Slot Constraints

Each slot should have defined constraints that Sonnet respects:

Slot Type	Max Characters (Korean)	Required	Notes
slide_title	30	Yes	Single line
block_title	20	Yes	Single line
item_title	15	Yes	Single line
item_content	80	No	2-3 lines
quote_text	120	Yes	3-4 lines
big_number	8	Yes	Number + unit
conclusion	60	Yes	Single line
caption	40	No	Single line

Korean consideration: Korean characters are roughly 2x the width of Latin characters at the same font size. Character limits should be specified in characters, not words, since Korean doesn't use spaces the same way as English.

4. HTML to PDF Conversion

4.1 Playwright (Recommended)

Why Playwright over Puppeteer:

Native Python SDK (no Node.js dependency for a Python project)
Multiple browser support (Chromium, Firefox, WebKit), though PDF only works in Chromium
Growing community, active maintenance, better CI/CD integration
Full CSS Grid support via real Chromium rendering engine

Python implementation:

from playwright.async_api import async_playwright

async def html_to_pdf(html_content: str, output_path: str) -> None:
    async with async_playwright() as p:
        browser = await p.chromium.launch()
        page = await browser.new_page()
        await page.set_content(html_content, wait_until="networkidle")
        await page.pdf(
            path=output_path,
            width="1280px",
            height="720px",
            print_background=True,
            prefer_css_page_size=True,
        )
        await browser.close()

Key options:

print_background=True -- required for background colors/images
prefer_css_page_size=True -- lets CSS @page rules control dimensions
width/height -- custom page dimensions (accepts px, in, mm, cm units)

4.2 Print CSS for Slide Format

@media print {
    @page {
        size: 1280px 720px;
        margin: 0;
    }
    body {
        margin: 0;
        -webkit-print-color-adjust: exact;
        print-color-adjust: exact;
    }
    .slide {
        width: 1280px;
        height: 720px;
        page-break-after: always;
        overflow: hidden;
    }
}

-webkit-print-color-adjust: exact is critical -- without it, background colors and images may be stripped in PDF output.

4.3 Quality Comparison

Both Puppeteer and Playwright use Chromium's print-to-PDF engine, so output quality is identical. The choice comes down to:

Factor	Playwright	Puppeteer
Language	Python, JS, C#, Java	JS/Node.js only
PDF engine	Chromium only	Chromium only
CSS Grid quality	Excellent (Chromium)	Excellent (Chromium)
Korean font rendering	Excellent	Excellent
Install size	~400MB (browser binary)	~300MB
API ergonomics	Better async patterns	More established

Recommendation: Playwright, because the Design Agent backend is Python. No need to bridge to Node.js.

4.4 Korean-Specific Considerations

Fonts must be available on the server. Self-host Pretendard/Noto Sans KR WOFF2 files or use CDN.
Set lang="ko" on the HTML element for proper line-breaking algorithms.
Ensure @font-face declarations are loaded before PDF generation (wait_until="networkidle").

5. Pure CSS Diagrams

5.1 Venn Diagrams (Pure CSS)

Technique: Overlapping circles with opacity and negative margins.

.venn-container { display: flex; align-items: center; justify-content: center; }
.venn-circle {
    width: 200px; height: 200px;
    border-radius: 50%;
    opacity: 0.7;
    display: flex; align-items: center; justify-content: center;
    padding: 20px;
    text-align: center;
}
.venn-a { background: var(--color-accent); }
.venn-b { background: var(--color-neutral); margin-left: -60px; }

Advanced approach (Adrian Roselli): CSS Grid + shape-outside for text wrapping within overlapping regions. More complex but better for text-heavy Venn diagrams.

Limitation: Pure CSS Venn diagrams work well for 2-3 circles. Beyond that, SVG is more practical.

5.2 Flowcharts / Process Arrows (Pure CSS)

Technique: Flexbox/Grid layout + pseudo-elements for arrows.

.process-steps { display: flex; align-items: center; gap: 0; }
.process-step {
    background: var(--color-bg-subtle);
    padding: var(--spacing-inner);
    position: relative;
    flex: 1;
}
.process-step + .process-step::before {
    content: '';
    position: absolute;
    left: -12px; top: 50%;
    transform: translateY(-50%);
    border: 8px solid transparent;
    border-left-color: var(--color-accent);
}

CSS Anchor Positioning (2025-2026): A new CSS feature for connecting elements with lines. Supported in Chrome 125+, Safari 26+, not yet in Firefox. Since we target Chromium (for PDF generation), this is usable but adds complexity. For the Design Agent, pseudo-element arrows are simpler and more reliable.

5.3 Tree/Hierarchy Diagrams (Pure CSS)

Technique: Nested <ul>/<li> + pseudo-elements for connector lines.

Libraries:

Treeflex (https://dumptyd.github.io/treeflex/) -- CSS-only library for hierarchy trees, no JS
Custom implementation using border-left on <li> for vertical lines, ::before for horizontal connectors, ::after for node circles

5.4 When to Use SVG Instead

Use Case	CSS	SVG	Recommendation
Venn (2-3 circles)	Good	Better	CSS for simplicity
Process arrows (linear)	Excellent	Overkill	CSS
Tree (2-3 levels)	Good	Better	CSS (Treeflex)
Complex flowchart (branches)	Difficult	Much better	SVG
Curved connectors	Impossible	Easy	SVG
Data-driven charts	Impossible	Required	SVG
Accessible diagrams	Poor	Excellent	SVG

Recommendation for Design Agent: Use pure CSS for simple diagrams (process arrows, 2-circle Venn, basic tree). Generate inline SVG for anything more complex. Since the renderer produces static HTML for PDF export, there's no JS concern.

Accessibility note: SVG has built-in accessibility elements (<title>, <desc>, aria-*). For the Design Agent's output (primarily visual slides for PDF), this is less critical but good practice.

6. Korean Typography in CSS

6.1 Font Selection

Primary recommendation: Pretendard

Modern system-ui replacement font designed for cross-platform use
Built on Inter (Latin) + Source Han Sans (CJK) + M PLUS 1p
9 weights + variable font support
Dynamic subset via CDN (Google Fonts-style loading for Korean)
Most popular Korean web font according to HTTP Archive 2024

CDN options:

/* Dynamic subset (recommended for web) */
@import url('https://cdn.jsdelivr.net/gh/orioncactus/pretendard@v1.3.9/dist/web/variable/pretendardvariable-dynamic-subset.min.css');

/* Or from cdnjs */
@import url('https://cdnjs.cloudflare.com/ajax/libs/pretendard/1.3.9/variable/pretendardvariable-dynamic-subset.min.css');

For PDF generation (self-hosted): Download WOFF2 files and serve locally to avoid CDN dependency during headless browser PDF generation.

Fallback stack:

font-family: 'Pretendard Variable', 'Pretendard', -apple-system, 'Noto Sans KR',
             'Malgun Gothic', sans-serif;

Alternative: Noto Sans KR

Google Fonts native, widest browser support
Available via Google Fonts CDN with automatic Korean subsetting
Good for when Pretendard is not available

6.2 Line-Height and Letter-Spacing

W3C KLREQ (Korean Layout Requirements) recommendations:

Property	Value	Rationale
`line-height`	1.6-1.8	CJK text needs ~1.7 (vs 1.2-1.5 for Latin) due to higher information density per character
`letter-spacing`	0 to -0.02em	Korean text looks best with tight or default spacing. Avoid positive letter-spacing.
`word-spacing`	normal	Korean uses spaces between words (unlike Japanese/Chinese)

For slides specifically:

body {
    line-height: 1.7;
    letter-spacing: -0.01em;
}
h1, h2, h3 {
    line-height: 1.3;
    letter-spacing: -0.02em;
}

6.3 Word-Break Rules

/* Korean text: keep syllable blocks together */
body {
    word-break: keep-all;
    overflow-wrap: break-word;
}

word-break: keep-all prevents line breaks within Korean syllable blocks (e.g., "한글" won't break as "한" / "글" mid-word). This is essential for readable Korean typography.

overflow-wrap: break-word is the safety net for extremely long strings (URLs, technical terms) that might overflow.

Browser support: All modern browsers support keep-all since 2016. There is ongoing W3C discussion about a potential keep-all-hangul value that would apply keep-all only to Hangul characters and normal to everything else.

6.4 Mixing Korean + English

Key challenge: At the same point size, Latin letters appear smaller than Korean characters.

Solutions:

Use a font designed for mixed text (Pretendard handles this well, being built on Inter + Source Han Sans)
Set the Latin font first in font-family stack, CJK font second (most CJK fonts include Latin glyphs, but their Latin is often inferior)
No need for font-size-adjust if using Pretendard (it's designed for optical balance between scripts)

Punctuation: Korean uses proportional-width punctuation (like Latin), unlike Japanese/Chinese which use full-width. Pretendard handles this correctly.

6.5 Language Attribute

Always set lang="ko" on the HTML element:

<html lang="ko">

This enables the browser's Korean-specific line-breaking algorithms and font selection.

7. FastAPI Integration

7.1 Serving the Design Agent

The Design Agent should be a FastAPI application with these endpoints:

Endpoint	Method	Purpose
`/api/analyze`	POST	Step 1: Send content to Opus for classification
`/api/generate`	POST	Steps 2-3: Sonnet selects + renderer produces HTML
`/api/generate/stream`	GET (SSE)	Steps 1-3 with real-time progress
`/api/preview/{job_id}`	GET	Return generated HTML for iframe preview
`/api/download/{job_id}/pdf`	GET	Generate and return PDF
`/api/download/{job_id}/html`	GET	Return HTML file
`/api/templates`	GET	List available block templates

7.2 SSE Streaming

FastAPI has native SSE support (added to official docs). Two library options:

Option A: sse-starlette (recommended)

Production-ready, W3C SSE spec compliant
Already used in the HWPX project
pip install sse-starlette

Option B: fastapi-sse

Lighter weight, built specifically for FastAPI
Supports sending Pydantic models as SSE events
pip install fastapi-sse

Implementation:

from sse_starlette.sse import EventSourceResponse

@router.get("/api/generate/stream")
async def stream_generation(content: str):
    async def event_generator():
        yield {"event": "step_start", "data": "analyzing"}
        # ... Opus classification
        yield {"event": "step_complete", "data": json.dumps(analysis)}

        yield {"event": "step_start", "data": "selecting"}
        # ... Sonnet selection
        yield {"event": "step_complete", "data": json.dumps(slots)}

        yield {"event": "step_start", "data": "rendering"}
        # ... CSS Grid rendering
        yield {"event": "complete", "data": job_id}

    return EventSourceResponse(event_generator())

7.3 File Upload

from fastapi import UploadFile, File

@router.post("/api/upload")
async def upload_content(file: UploadFile = File(...)):
    # Read and extract text from uploaded file
    text = await extract_text(file)
    return {"text": text, "filename": file.filename}

7.4 Static File Serving for Preview

For development: FastAPI's StaticFiles mount for serving generated HTML:

from fastapi.staticfiles import StaticFiles
app.mount("/static", StaticFiles(directory="output"), name="static")

For production: Serve HTML via API response body (like the HWPX project does with <iframe srcDoc>). This is more secure -- no direct file path exposure.

Font serving: Self-hosted Pretendard WOFF2 files should be served as static files for reliable PDF generation.

Summary: Technology Stack Recommendation

Component	Technology	Version	Rationale
LLM (Analysis)	Claude Opus via Anthropic API	Structured Outputs beta	Content classification, layout direction
LLM (Selection)	Claude Sonnet via Anthropic API	Structured Outputs beta	Slot filling, content editing
Template Engine	Jinja2	>=3.1	Native Python, FastAPI integration, block inheritance
CSS Layout	CSS Grid + grid-template-areas	Native CSS	Named regions map to block composition
Design Tokens	CSS Custom Properties	Native CSS	Already defined in CLAUDE.md
Typography	Pretendard Variable	1.3.9	Best Korean web font, dynamic subset
Fallback Font	Noto Sans KR	Latest	Google Fonts CDN backup
PDF Generation	Playwright (Python)	>=1.40	Native Python SDK, full CSS Grid support
Web Framework	FastAPI	>=0.115	SSE support, file upload, same as HWPX project
SSE	sse-starlette	>=2.0	Production-ready, W3C compliant
CSS Diagrams	Pure CSS + inline SVG fallback	N/A	Pseudo-elements for simple, SVG for complex
Slide Scaling	CSS transform: scale()	Native CSS	reveal.js-proven approach
Korean Line Breaking	word-break: keep-all	Native CSS	W3C KLREQ recommendation

26 KiB Raw Permalink Blame History