Files

HYUNJUNGLEE e9cc6bfcf4 Phase 0 of expert feedback (#1~#11): infrastructure + design + 1차 fixes

Implementations (즉시 동작):
- #1 crash logging: harness/crash_logger.py (sys.excepthook + threading +
  faulthandler, 회전 파일 logs/scanvas.log). main 진입점 통합.
- #2 smooth curves (1차): gate_3d_builder ogee profile를 arc-length parametric
  CubicSpline로 4× densify (8pt→32pt, 36→132 cells, 60 FPS 안전).
- #3 TIN colormap: matplotlib "terrain"의 파란색 범위 제거 → 짙은갈색→황토→
  모래→능선 LinearSegmentedColormap. 9 사이트 교체. 회귀 테스트 추가.
- #5 uv: pyproject.toml + UV_GUIDE.md. base/[py313]/[dev]/[build] extras + hatchling.
- #6,#7,#8 dev cycle infra: .pre-commit-config.yaml (ruff+secrets+위생),
  .gitea/workflows/ci.yml (Py3.11+3.13 matrix), tests/test_regressions.py
  (18 회귀 테스트, iter=1~7 fix 박제), CONTRIBUTING.md (Red→Green 알고리즘).

Design docs (다음 세션 마이그레이션 청사진):
- #4 UI/UX 전면 수정: UI_REDESIGN_PLAN.md (12 popup→1 inspector, vtkTkRenderWidget
  embedding 게이트, 4 phase × 7 sessions).
- #10 Core/Plugin: ARCHITECTURE_PLAN.md (Core 14 / Plugin 7 구조물 + 2 렌더 + 1 QA,
  STRUCTURE_REGISTRY 확장, manifest 기반 디스커버리).
- #11 perf hotspots: PERFORMANCE_BASELINE.md (19 핫스팟, P1: 타일 직렬DL 5~30s,
  캡처 직렬 4.5~15s, numpy 벡터화 가능 Python loops, 텍스처 4회 반복read).

Behavior preservation: ruff 0 errors, pytest 17 passed/1 skipped(bpy),
import 33/33 OK on Py3.13.13.

Item #2 P2/P3 곡선, #4 UI 마이그레이션, #10 Phase 1 추출, #11 P1 최적화는 차기 세션.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-05-08 11:45:30 +09:00

19 KiB

Raw Blame History

S-CANVAS 성능 베이스라인 (Phase 0 — 진단)

본 문서는 read-only 정적 분석 결과로, 실제 측정 없이 코드 패턴/복잡도/I/O 경계를 근거로 추정한 핫스팟 후보 목록이다. 실측 단계(Phase 1)에서 본 문서의 "측정 instrumentation 패치"를 코드에 일시 삽입해 사용자가 실 도면으로 측정한 뒤 3.측정 후 비교 표를 채우면 된다.

기준 출처:

.claude/agents/performance-guardian.md (함정 1~7 — 메인스레드 블로킹 / 폴리곤 폭발 / 매 프레임 재생성 / I/O 직렬화 / GI 과다 / 텍스처 메모리 폭발 / 구조물 누적)
사용자 피드백 #11: "위성지도 결합·구조물 빌드 시 CPU 대폭 증가 → ms 단위 추적·최적화"

1. 추정 핫스팟

#	경로/시나리오	파일:라인	의심 카테고리	근거
H1	XYZ 위성 타일 직렬 다운로드	`tile_downloader.py:98-124`	Network-bound, 메인스레드 블로킹 (함정 1+4)	이중 for 루프로 `requests.get` 직렬 호출. 줌17 + 1km×1km bbox = 약 16~~50타일, 타일당 100~~600ms, 총 5~30초. `btn_draping_callback`이 메인스레드에서 호출 → GUI 동결.
H2	DEM (terrarium) 타일 직렬 다운로드	`dem_extender.py:138-150`	Network-bound (함정 4)	동일 직렬 루프. `fetch_terrarium_grid`는 z=13, buffer=1000m면 보통 4~16타일이지만 캐시 미스 시 GUI 동결.
H3	TIN densify Phase C (10→1m 점진 격자)	`scanvas_maker.py:4405-4455`	CPU-bound (numpy + scipy), 메인스레드 (함정 1+2)	`for _step in (10..1)` 안에서 `ConvexHull` 재계산 + meshgrid + `MplPath.contains_points` + cKDTree 쿼리 + 매 단계 DEM bilinear 샘플. 큰 도면(2km×2km)에서 10단계 × 수만점 = 수 초.
H4	TIN densify Phase B (긴 edge 중심 추가)	`scanvas_maker.py:4458-4477`	CPU-bound (Delaunay), 메인스레드	임시 Delaunay 1회 추가. 정점 ~~10만 시 0.5~~2초.
H5	TIN bbox gap 채움 (Step 1.5-a)	`scanvas_maker.py:5089-5263`	CPU-bound + 잠재 Network (함정 1+4)	Phase C와 동일 알고리즘 재실행. 캐시된 `_dem_elev_grid`가 있으면 CPU만, 없으면 추가 fetch. v6 벽 컷 numpy 벡터화는 빠름.
H6	최종 Delaunay (TIN 생성 후)	`scanvas_maker.py:4502, 5216, 3343`	CPU-bound, 메인스레드	scipy `Delaunay`는 O(n log n)이지만 numpy 출신이 아닌 native Qhull → GIL 안 풀림. 수십만 정점 시 1~3초.
H7	DEM 링 메시 빌드 — outer smooth blend / Laplacian	`dem_extender.py:600-707`	CPU-bound (cKDTree + Python loop)	라인 625, 696의 `for k, nb in enumerate(nbrs)` Python 루프. 격자점 1000개 + 이웃 평균 = 0.5~2초. numpy 벡터화 가능.
H8	Step 1.5 경계 재보간 cKDTree	`scanvas_maker.py:5249-5329`	CPU, 가벼움	단일 cKDTree + np.where. 빠름(<200ms). 무시 가능.
H9	`_excavate_tin_for_structures` Python 루프	`scanvas_maker.py:2983-2993, 3025-3031`	CPU-bound, 메인스레드 (함정 1)	`for i in band_idx` / `for i, d in enumerate(grid_d)` 순수 Python. 구조물 5개 + 각 격자 1000점 = 0.5~2초. numpy 벡터화 즉시 가능.
H10	`_composite_material_textures` PIL 픽셀 합성	`scanvas_maker.py:3923-3996`	I/O + CPU, 메인스레드	PIL `Draw.polygon` × 도로 수, 노이즈 텍스처 추가. 2048×2048에서 100~500ms.
H11	위성 텍스처 final resize LANCZOS 2048	`tile_downloader.py:147`	CPU, 메인스레드	큰 합성 이미지를 한 번 리사이즈. 200~600ms. 무시 가능 그러나 일부 PC에서 GIL 동안 다른 일 못 함.
H12	캡처 단계: PyVista off_screen plotter 3회 생성	`scanvas_maker.py:5849-5867`	GPU+CPU, 메인스레드 (함정 1+3)	`_capture_from_camera` / `_capture_depth_from_camera` / `_capture_lineart_from_camera` 각각 새 `pv.Plotter(off_screen=True)` 생성, 메시 add, 그리고 screenshot. 한 번에 1.5~~5초. 세 번 직렬 = 4.5~~15초 GUI 동결.
H13	show_3d_preview 의 merge + extract_surface + compute_normals	`scanvas_maker.py:5485-5500`	CPU, 한 번 발생 (한정적)	`feature_angle=180.0` 전체 메시 노멀 재계산은 큰 메시(~~50만 cells)에서 1~~3초. 매번 호출되지만 1회/프리뷰 오픈이라 함정 3 해당 안 함.
H14	`_add_template_structures_to_plotter` 로깅 + bounds 진단	`scanvas_maker.py:5640-5760+`	CPU, 메인스레드	매 구조물마다 `np.concatenate([m.points])` 두 번(raw, placed). 50개 구조물 × 메시 20개 → 1000회 concat. 100~400ms.
H15	`download_xyz_tiles` 최종 PIL `merged.crop().resize()`	`tile_downloader.py:146-147`	CPU, 메인스레드	bbox 크롭 후 LANCZOS 2048 리사이즈. 가벼움.
H16	`pv.read_texture("satellite_temp.png")`	`scanvas_maker.py:5393, 5894, 6281`	I/O, 메인스레드	매 capture/preview마다 디스크 재읽기. 200~~500ms × 4회 = 1~~2초 낭비. 재사용/캐싱 가능.
H17	`enable_eye_dome_lighting()` 매 plotter	`scanvas_maker.py:5563, 5914, 6339, 6359`	GPU, 60FPS 영향 (함정 5)	EDL은 비용이 보통이지만 SSAO와 누적되면 30FPS로 떨어짐. 큰 메시에선 주의.
H18	`_build_plan_overlay_meshes` 모든 계획선 매번 재생성	`scanvas_maker.py:3599-3700+`	CPU, show_3d_preview 호출 시마다 (함정 3)	메시는 변하지 않는데 매 프리뷰 오픈마다 재빌드. 캐싱 가능.
H19	TIN 생성 시 ezdxf entity 순회	`scanvas_maker.py:4187-4226`	I/O+CPU	6개 entity 타입 × 모든 modelspace 엔티티. 큰 DXF(수만 등고선)에서 2~10초.

2. 측정 instrumentation 패치

각 핫스팟에 삽입할 컨텍스트 매니저. 본 라운드에서는 삽입하지 않음(read-only). 사용자가 Phase 1 측정 시 임시로 삽입 → 측정 후 즉시 제거.

2.1 공통 컨텍스트 매니저 (`scanvas_maker.py` 상단에 추가)

import time
from contextlib import contextmanager

@contextmanager
def _perf(label, log_fn=print):
    """일회성 측정. 시작/끝 ms 출력. CPU vs wall-time 둘 다."""
    t_wall = time.perf_counter()
    t_cpu = time.process_time()
    try:
        yield
    finally:
        dt_wall = (time.perf_counter() - t_wall) * 1000
        dt_cpu = (time.process_time() - t_cpu) * 1000
        log_fn(f"  [PERF] {label}: wall={dt_wall:.1f}ms cpu={dt_cpu:.1f}ms "
               f"({'CPU' if dt_cpu/max(dt_wall,1e-3) > 0.5 else 'I/O/Net'}-bound)")

판별: cpu/wall > 0.5 → CPU-bound, 그 외 → I/O/Network-bound (GIL 풀린 시간).

2.2 H1: XYZ 타일 다운로드 (`tile_downloader.py:98`)

# 기존 for 루프 직전
with _perf(f"XYZ tiles {cols}x{rows}={cols*rows}", log_fn):
    for ty in range(y_min, y_max + 1):
        ...  # 기존 루프

2.3 H2: terrarium fetch (`dem_extender.py:138`)

with _perf(f"terrarium fetch {cols}x{rows} z{zoom}", log_fn):
    for ty in range(y_min, y_max + 1):
        ...

2.4 H3: Phase C 점진 densify (`scanvas_maker.py:4414`)

with _perf(f"Phase C densify (10->1m, n_pts={len(pts)})", self.log):
    for _step in (10.0, 9.0, 8.0, 7.0, 6.0, 5.0, 4.0, 3.0, 2.0, 1.0):
        with _perf(f"  step {_step}m", self.log):
            try:
                hull_c = _ConvexHull(pts[:, :2])
            except Exception:
                break
            ...  # 기존 루프 본문

2.5 H4: Phase B Delaunay (`scanvas_maker.py:4458`)

with _perf(f"Phase B Delaunay (n={len(pts)})", self.log):
    tri_tmp = Delaunay(pts[:, :2])

2.6 H5: Step 1.5-a 채움 (`scanvas_maker.py:5172`)

with _perf("Step 1.5-a fill (point progressive 10->1m)", self.log):
    current_abs = pts_abs.copy()
    ...  # 기존 점진 densify 루프

2.7 H6: 최종 Delaunay (`scanvas_maker.py:4502`)

with _perf(f"final Delaunay (n_pts={len(pts)})", self.log):
    tri = Delaunay(pts[:, :2])

2.8 H7: DEM 링 build (`dem_extender.py` 함수 진입부)

def build_extended_terrain_ring(...):
    with _perf(f"build_extended_terrain_ring (buffer={buffer_m}m, "
               f"step={grid_step_m or 'auto'})", log_fn):
        ...  # 함수 본문 전체

내부 단계 분리:

with _perf("  Phase 1: ring point gen", log_fn): ...
with _perf("  Phase 2: WGS84 transform + DEM sample", log_fn): ...
with _perf("  Phase 3: outlier + spike filter", log_fn): ...
with _perf("  Phase 4: feathering + Laplacian", log_fn): ...
with _perf("  Phase 5: Delaunay + cut", log_fn): ...

2.9 H9: `_excavate_tin_for_structures` (`scanvas_maker.py:2939` 루프 내)

for info in self.structure_registry.values():
    with _perf(f"  excavate {info['name']}", self.log):
        ...

2.10 H10: `_composite_material_textures` (`scanvas_maker.py:3886`)

def _composite_material_textures(self, satellite_img, ...):
    with _perf(f"composite_materials img={satellite_img.size}", self.log):
        ...

2.11 H12: 캡처 3종 (`scanvas_maker.py:5849`)

with _perf(f"capture_textured {out_w}x{out_h}", self.log):
    self.capture_image = self._capture_from_camera(out_w, out_h, textured=True)
with _perf(f"capture_depth", self.log):
    self.depth_map = self._capture_depth_from_camera(out_w, out_h)
with _perf(f"capture_lineart", self.log):
    self.lineart_map = self._capture_lineart_from_camera(out_w, out_h)

2.12 H13: show_3d_preview merge (`scanvas_maker.py:5485`)

with _perf(f"unified merge+normals (TIN {target_mesh.n_points} + "
           f"DEM {ext_mesh.n_points})", self.log):
    merged = target_mesh.merge(ext_mesh, merge_points=True, tolerance=0.01)
    merged = merged.extract_surface() if not isinstance(merged, pv.PolyData) else merged
    ...
    merged.compute_normals(feature_angle=180.0, ...)

2.13 H19: ezdxf entity 순회 (`scanvas_maker.py:4187`)

with _perf(f"DXF entity ingest", self.log):
    for entity in msp.query('LWPOLYLINE'):
        ...
    # 모든 6개 타입 루프 동일 컨텍스트 안

2.14 60FPS 게이트 (PyVista 인터랙티브 뷰어용)

_open_interactive_viewer 안 (scanvas_maker.py:5884 이후, p.show() 직전):

# performance-guardian.md의 FPSLogger 그대로
class _FPSLogger:
    def __init__(self, plotter):
        self.last_t = time.perf_counter()
        self.frames = 0
        plotter.iren.add_observer("RenderEvent", self.on_render)
    def on_render(self, *_):
        self.frames += 1
        now = time.perf_counter()
        if now - self.last_t > 1.0:
            fps = self.frames / (now - self.last_t)
            self.log(f"  [FPS] {fps:.1f}")
            self.frames = 0
            self.last_t = now

self._fps_logger = _FPSLogger(p)  # 가비지 컬렉션 방지

3. 측정 후 비교 표 템플릿

사용자가 실 도면(예: 사연댐 1km×1km, ~10만 측점)으로 Phase 1 측정 후 채움.

3.1 시나리오 A — Step 1 (TIN 생성)

핫스팟	측정 wall (ms)	측정 cpu (ms)	카테고리	60FPS 영향	비고
H19 ezdxf 순회
H3 Phase C densify
H4 Phase B Delaunay
H6 최종 Delaunay
합계 (Step 1 전체)					목표 < 5초

3.2 시나리오 B — Step 1.5 (DEM 확장)

핫스팟	카테고리	비고
H2 terrarium fetch (캐시 미스)	Network
H2 terrarium fetch (캐시 히트)	I/O
H5 Step 1.5-a fill	CPU
H7 build_extended_terrain_ring	CPU
H8 경계 재보간	CPU
합계		목표 < 8초

3.3 시나리오 C — Step 2 (위성지도 결합)

핫스팟	카테고리	비고
H1 XYZ 타일 fetch	Network	직렬 16~50타일
H10 material composite	CPU
H11 LANCZOS resize	CPU
H16 read_texture	I/O
H13 unified merge	CPU
합계		목표 < 10초

3.4 시나리오 D — Step 3 (캡처 4종)

핫스팟	카테고리	비고
H12 capture_textured	GPU+CPU
H12 capture_depth	GPU+CPU
H12 capture_lineart	GPU+CPU
H10/_compose_guide_image	CPU
합계		목표 < 6초

3.5 시나리오 E — 인터랙티브 뷰어

측정	목표	비고
평균 FPS (회전 중)	≥ 60	EDL ON
TIN n_cells	≤ 100K	단일 메시
DEM ring n_cells	≤ 100K	단일 메시
통합 메시 n_cells	≤ 500K	함정 7 한도
구조물 누적 n_cells	≤ 200K 추가	합계 ≤ 500K

4. 최적화 후보

H1 (XYZ 타일 직렬 fetch) — 가장 큰 이득

ThreadPoolExecutor(max_workers=8)로 병렬화 (함정 4 정석 해법). 타일 IP 분산은 이미 _SUBDOMAINS 사용 중.
직렬 30초 → 병렬 4~6초 예상.
메인스레드 분리: btn_draping_callback을 threading.Thread(daemon=True).start() 패턴으로 감싸 GUI 블록 제거 (함정 1).
Disk 캐시 추가 (BBOX 해시 키, terrarium_grid처럼).

H2 (terrarium 직렬 fetch)

동일하게 ThreadPoolExecutor. 캐시 히트면 무관.
캐시 디스크 접근만 비동기로 풀어도 200ms 절감.

H3 (Phase C 10→1m 점진 densify)

현재 10단계 모두 실행 → early-exit: hull이 bbox의 99% 이상 덮으면 break.
meshgrid 결과 cKDTree 거리 검사 → numpy np.in1d + bbox 마스크로 더 빠르게.
매 step마다 ConvexHull 재계산 (Python+Qhull) 회피: 첫 hull로 한 번만 계산해도 충분(점이 추가될수록 hull은 단조 확장).
lazy 로드: Phase C는 사용자가 "TIN 이용 범위" 선택했으면 skip (이미 로직 있음 — 5025줄).

H4 (Phase B centroid 추가)

numpy 벡터화 완료, 추가 최적화 거의 불필요. 큰 도면에서 임계값(50m → 100m)으로 절반 줄일 수 있음 (시각 차이 미미).

H5 (Step 1.5-a)

H3와 동일 패턴 → 동일 처방. 추가로 cached _dem_elev_grid 재사용은 이미 됨 (네트워크 절감).

H6 (최종 Delaunay)

scipy Delaunay는 GIL 풀려서 BackgroundThread + app.after(0, callback) 가능 (함정 1).
큰 도면용 옵션: qhull_options="Qbb Qc Qz" 더 빠름.

H7 (DEM ring outer smooth blend / Laplacian)

for k, nb in enumerate(nbrs) Python 루프 → numpy 벡터화. 평균은 padded array 또는 scipy ndimage.generic_filter 가능.
0.5~~2초 → 50~~200ms 예상.

H9 (`_excavate_tin_for_structures` 루프)

for i in band_idx / for i, d in enumerate(grid_d) → 순수 numpy:

d = signed_d[band_idx]
t = np.clip(d / transition_w, 0, 1)
blend = t*t*(3 - 2*t)
inside = d <= 0
work_pts[band_idx, 2] = np.where(inside, pad_z,
                                 pad_z * (1-blend) + orig_pts[band_idx, 2] * blend)

100~500ms → <50ms. 즉시 가능.

H10 (material composite)

PIL 그대로 두되 메인스레드 분리 (background thread + after(0)).
노이즈 레이어는 큰 PNG 캐싱 (한 번만 생성).

H12 (캡처 3종 직렬)

같은 카메라/뷰포인트라 plotter 재사용 가능: 한 번 생성 → screenshot 3번 (color/depth/lineart) → close.
- 단 lineart는 show_edges=True 다른 mesh, depth는 add_mesh(color="white") 다른 마테리얼. 이거 토글이 PyVista에서 가능한지 확인 필요(아마 actor.GetProperty().SetEdgeVisibility(...)로 됨).
4.5~~15초 → 1.5~~5초 예상.
메인스레드 분리 + 진행률 표시 (함정 1).

H13 (unified merge + compute_normals)

캐싱: unified_mesh를 self에 저장, tin_mesh/tin_extension_mesh 변경 시에만 무효화. 매 show_3d_preview 호출마다 1~3초 재계산 → 한 번 후 재사용.

H16 (`pv.read_texture` 4회 디스크 재읽기)

self._cached_texture = None. draping 시 한 번 읽고 caller에서 재사용. capture/preview에서는 self._cached_texture를 직접 사용.

H17 (EDL 누적)

60FPS 검사 후 떨어지면 EDL OFF 옵션 제공. 사용자가 "고품질 / 성능" 토글.

H18 (overlay 매번 재생성)

self._cached_overlay_meshes = None. layer_geometries 변경 시에만 재빌드.

H19 (ezdxf 순회)

6개 타입 루프 → 한 번 전체 순회 후 dispatch 가능. 그러나 ezdxf msp.query가 빠르게 인덱싱 → 큰 차이 없을 수 있음. 측정 후 결정.

LOD (전체 씬 폴리곤 폭발 대응 — 함정 2/7)

현재 코드에 n_cells > 100_000 체크 / decimate(target_reduction=0.7) 없음. 사용자 "큰 도면 100배"에 무방비.
SceneBudgetTracker 추가 (cities_placement_widget.py에 이미 있다면 두 워크플로 공통 게이트로 끌어올림).

TIN/DEM ring 메시 추가 직전:

if mesh.n_cells > 100_000:
    self.log(f"  메시 단순화: {mesh.n_cells} → ", end="")
    mesh = mesh.decimate(target_reduction=0.7)
    self.log(f"{mesh.n_cells}")

5. 60 FPS 게이트 검증 방법

_open_interactive_viewer 안에 §2.14 FPSLogger 삽입. 사용자 액션:

Step 3 클릭 → 인터랙티브 뷰어 열림
10초간 마우스로 회전/줌
콘솔에 [FPS] 58.3 같은 라인이 1초마다 찍힘
평균값을 §3.5에 기록

게이트 기준:

평균 ≥ 60 FPS → 통과
30 ≤ 평균 < 60 → EDL OFF / 메시 decimate / DEM ring step 키움
평균 < 30 → 메시 폴리곤 폭발 의심 → §4 LOD 항목 즉시 적용

6. 우선순위

P1 (즉시 — 사용자가 측정 보고 후 다음 라운드)

H1: XYZ 타일 ThreadPoolExecutor 병렬화 + threading.Thread 분리 (사용자 피드백 #11 정확히 매치, 가장 큰 체감 이득)
H12: 캡처 3종 plotter 재사용 + threading 분리 (Step 3 GUI 동결 제거)
H9: 굴착 루프 numpy 벡터화 (즉시 가능, 위험 0)
H16: 텍스처 4회 디스크 재읽기 → 캐싱 (한 줄 수정)

P2 (다음 — 측정으로 회귀 위험 검증 후)

H2: terrarium 병렬화 + threading
H3/H5: Phase C/Step 1.5-a early-exit + ConvexHull 1회 캐싱
H7: DEM ring Laplacian / outer blend numpy 벡터화
H13: unified merge 결과 캐싱
H18: overlay 메시 캐싱

P3 (장기 — 큰 도면/구조물 누적 시나리오용)

LOD/decimate 게이트 도입 (함정 2/7 대비)
SceneBudgetTracker 통합 (Cities + DXF 워크플로 공통)
EDL/SSAO ON/OFF 토글 (함정 5 대비, 저사양 PC)
텍스처 4K → 2K 다운샘플 옵션 (함정 6 대비)

7. 작업 흐름 (Phase 1 측정용)

사용자: §2의 instrumentation 패치를 임시로 삽입 (한 번에 다 넣지 말고 P1 핫스팟부터).
사용자: 실 도면(사연댐 권장, 1~~2km bbox)으로 Step 1~~3 시나리오 실행.
결과 콘솔 로그를 outputs/perf_baseline_run_YYYYMMDD.log 저장.
§3 표를 채움 → 어디가 정말 느린지 정량 확인.
P1 항목부터 패치 라운드 진행.
패치 후 동일 시나리오 재측정 → 회귀 확인.

19 KiB Raw Blame History Unescape Escape