feat: Implement full conversion pipeline (PDF/HWP/HWPX/HML/HTML)

- convert.py: 통합 CLI, --json 출력, --scan 폴더 모드
- converters/pdf.py: 페이지별 분류(text/diagram/mixed) + marker-pdf + PNG 렌더링
- converters/hwp.py: COM 자동화 + pyhwp fallback
- converters/hwpx.py: ZIP+XML 직접 파싱, 이미지 추출
- converters/hml.py: XML 파싱, Base64 이미지 추출, colspan/rowspan HTML 표
- converters/html.py: html2text (body_width=0)
- requirements.txt: 최소 의존성
- .env.example: 환경변수 템플릿

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
minsung
2026-04-20 09:06:34 +09:00
parent 6f365018f5
commit 2ec2759a20
12 changed files with 1072 additions and 0 deletions

1
.claude/hooks/token-usage/.gitignore vendored Normal file
View File

@@ -0,0 +1 @@
aptabase.json

43
.claude/settings.json Normal file
View File

@@ -0,0 +1,43 @@
{
"hooks": {
"UserPromptSubmit": [
{
"hooks": [
{
"type": "command",
"command": "t=$(mktemp);cat>\"$t\";e=./.claude/hooks/token-usage/claude-hook.exe;[ -x \"$e\" ] && \"$e\" session-context \"$t\";rm -f \"$t\"",
"timeout": 5
}
]
}
],
"Stop": [
{
"hooks": [
{
"type": "command",
"command": "t=$(mktemp);cat>\"$t\";e=./.claude/hooks/token-usage/claude-hook.exe;[ -x \"$e\" ] && \"$e\" stop-record \"$t\";rm -f \"$t\"",
"timeout": 5
}
]
}
],
"PostToolUse": [
{
"matcher": "Bash",
"hooks": [
{
"type": "command",
"command": "t=$(mktemp);cat>\"$t\";e=./.claude/hooks/token-usage/claude-hook.exe;[ -x \"$e\" ] && \"$e\" aptabase-commit \"$t\";rm -f \"$t\"",
"timeout": 15
}
]
}
]
},
"permissions": {
"allow": [
"mcp__gitea__issue_write"
]
}
}