Setup RailPose3D harness (Planner/Generator/Evaluator)

Name the project RailPose3D and stand up a multi-agent harness following the Anthropic harness-design blog principles (decomposition, separation of concerns, file-based handoff, sprint contracts, context-reset over compaction). - CLAUDE.md / PLAN.md / PROGRESS.md as the file-based handoff surface; every agent must read PLAN+PROGRESS before acting. - 7 sub-agents under .claude/agents/: plan-architect (Planner), pole-detector-builder, rail-detector-builder, triangulation- builder, data-pipeline-builder (Generators), module-evaluator (Evaluator), dataset-explorer (read-only helper). - 6 skills under .claude/skills/: /start /sprint /eval /progress /handoff /contract. - SessionStart and Stop hooks to inject the PLAN/PROGRESS briefing and remind about PROGRESS.md updates. - docs/plan.md captures the user-approved detailed plan; docs/research.md is the prior tech survey. - .gitignore excludes data/, .usage/, model checkpoints, and local Claude overrides. Tracking: closes #1 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-28 08:32:05 +09:00
parent 39df31f3e4
commit 417f880a87
23 changed files with 1202 additions and 0 deletions
--- a/docs/plan.md
+++ b/docs/plan.md
@@ -0,0 +1,243 @@
+# 드론 영상 기반 전철주·레일 검출 → 3D 좌표 추출 파이프라인 계획
+
+## Context (배경 및 문제 정의)
+
+사용자 목표: 드론 정사영상(orthomosaic)과 드론 동영상/사진으로부터 **전철주(catenary pole)** 와 **레일(rail)** 같은 가늘고 긴 구조물을 검출하고, 그 3D 좌표를 추출하는 것.
+
+이미 갖추어진 것:
+- **SfM/Photogrammetry 파이프라인** (COLMAP/Metashape)으로 카메라 포즈와 sparse/dense cloud 생성 가능
+- 라벨링 30장 (검증/시드용으로 활용)
+
+현재 막혀있는 핵심 문제 4가지:
+1. **전철주 하단(base, 지면 접점) 좌표를 정확히 잡지 못함** — 3D 좌표화의 기준점인데, 카메라가 oblique(비스듬히)일 때 **완금(crossarm, T자 가로팔)이 영상상 가장 아래쪽에 보이는 화소**가 되어, bbox 기반 detector가 완금을 base로 오인.
+2. **라벨링 비용** — 30장만 보유. 일반적인 supervised 학습으론 불충분.
+3. **SfM 메시 품질** — 가는 수직물·금속 레일은 텍스처가 부족해 dense reconstruction이 너덜너덜하게 깨짐. 메시에서 직접 객체 추출 불가.
+4. **레일은 bbox로 표현 불가** — 화면을 가로지르는 긴 곡선이므로 segmentation/polyline이 자연스러운 표현.
+
+## 핵심 전략
+
+**"메시에서 객체를 추출하지 말고, 2D 영상에서 정확히 검출한 뒤 SfM 카메라 포즈로 삼각측량하여 3D 좌표를 만든다."** SfM의 강점(카메라 포즈)만 쓰고 약점(가는 물체 dense recon)은 우회한다.
+
+세 모듈로 나누어 구축:
+- **(A) 전철주 base 검출** — 4-keypoint pose estimation으로 *구조적*으로 base/완금 구분
+- **(B) 레일 polyline 검출** — Segmentation + skeletonization (사전학습된 레일 모델에서 transfer)
+- **(C) 2D → 3D 통합** — 다중뷰 삼각측량 + DBSCAN 군집 + bundle adjustment
+
+전체 파이프라인은 **이미 운영 중인 SfM 출력에 detection 모듈만 끼워넣는 형태**라 최소 침습.
+
+## 모듈 A: 전철주 base 검출
+
+### 핵심 통찰
+**완금이 base로 오인되는 문제는 통계적 문제가 아니라 기하학적 문제다.** Bounding box의 "가장 아래쪽 픽셀"은 oblique 시점에서 완금일 수 있으므로, 모델이 학습량과 무관하게 못 푼다. → **Pose/keypoint 모델로 "이 점은 base, 저 점은 완금 끝"을 의미적으로 구분**해야 풀린다.
+
+### 채택 모델: RTMPose-m (or YOLO11-pose) + 4-keypoint 스켈레톤
+- **스켈레톤**: `{base, top, left_crossarm_tip, right_crossarm_tip}` (각 전철주당 4점)
+- **이유**: 사람 포즈 추정에서 "왼발/오른발"을 헷갈리지 않게 만든 것과 정확히 같은 트릭. base 채널은 오직 지면 접점에서만 supervise 되므로 완금과 구조적으로 구분됨.
+- **Repo**: https://github.com/open-mmlab/mmpose (RTMPose), https://github.com/ultralytics/ultralytics (YOLO11-pose)
+- **fallback**: ViTPose-Base (frozen DINOv2 backbone) — 30장 같은 극소 데이터에서 더 강함
+
+### 라벨 부족(30장) 대응 — 4단계 부트스트랩
+1. **Day 1–3 (라벨 0개로 시작)**: Grounding-DINO 1.5 (`"utility pole"`, `"전봇대"`) → SAM 2.1 box prompt → 마스크의 PCA 주축의 아래쪽 끝 → base 후보. 30장에 대해 픽셀 오차 측정 = baseline.
+   - Repos: https://github.com/IDEA-Research/Grounding-DINO-1.5-API , https://github.com/facebookresearch/sam2
+2. **Day 4–5**: MVP 출력 중 깨끗한 것을 RTMPose 학습용 부트스트랩 라벨로 사용.
+3. **Week 2+**: Albumentations + **copy-paste augmentation** (SAM2 마스크로 전철주를 잘라 다양한 배경에 붙이기, keypoint도 함께 변환). 5–10배 데이터 증강.
+4. **Week 3+ (가장 큰 레버리지)**: **SfM-consistent self-training 루프**
+   - 한 뷰의 base detection을 SfM 포즈로 다른 뷰에 reprojection → pseudo-label
+   - reprojection error > 5px이면 outlier로 제거 (완금을 base로 잘못 잡으면 자동으로 걸러짐 — 다른 뷰의 정상 detection들과 3D상으로 일관되지 않으므로)
+   - 참고: EpipolarPose (CVPR'19), Pixel-Perfect SfM (ICCV'21) https://github.com/cvg/pixel-perfect-sfm
+
+### 하지 말아야 할 것
+- bbox 기반 YOLO/Faster-RCNN으로 base 추정 (위 기하 문제로 실패)
+- BlenderProc 합성 데이터에만 의존 (sim-to-real gap)
+- 30장으로 처음부터 큰 모델 full retrain
+
+## 모듈 B: 레일 polyline 검출
+
+### 채택 모델: SegFormer-B2 + 3단계 transfer learning + skeletonize
+- **이유**: 레일은 bbox 부적합. Segmentation은 nadir(평행선)/oblique(소실점 곡선) 모두 view-invariant하게 처리 가능. SegFormer-B2는 hierarchical attention으로 가늘고 긴 구조의 long-range continuity를 잘 잡으며, 라벨 효율이 좋음.
+- **3단계 fine-tune**: RailSem19 (8,500 시퀀스, 65% IoU 사전학습 가능) → **UAV-RSOD** (드론 시점 레일 데이터셋, 가장 직접적 도메인) → 사용자의 30장
+- **Repos / Datasets**:
+  - SegFormer: https://github.com/NVlabs/SegFormer (HF: https://huggingface.co/blog/fine-tune-segformer)
+  - RailSem19: https://wilddash.cc/railsem19
+  - UAV-RSOD: https://www.nature.com/articles/s41597-024-03952-3
+  - 대안 아키텍처: NL-LinkNet-SSR (Drones MDPI 2024, 드론 레일 전용 F1=0.749) https://www.mdpi.com/2504-446X/8/11/611
+
+### Polyline 추출 후처리
+1. 이진 마스크 (rail / not-rail)
+2. **`skimage.morphology.skeletonize`** → 1px 중심선
+3. Connected component split → 각 레일 분리 (instance ID)
+4. **Ramer-Douglas-Peucker** 단순화 (eps ~1–2px) → 정렬된 polyline
+5. **선택적 sub-pixel refinement**: DeepLSD attraction field로 skeleton을 sub-pixel로 snap (https://github.com/cvg/DeepLSD)
+
+### MVP (week 1)
+**Grounded-SAM 2** (텍스트 프롬프트 `"railway track"`, `"steel rail"`) → 30장 수작업 보정 → SegFormer fine-tune → 나머지 자동 라벨 → 반복(active learning loop).
+
+### 하지 말아야 할 것
+- LaneATT/CLRNet 등 레인 검출 모델 직접 적용 (전방 차량 시점 가정 — drone nadir에 부적합)
+- Sat2Graph/RoadTracer 그래프 모델 (이 규모/감독 형태 아님)
+- HAWP/LETR/M-LSD wireframe 검출기 (실내 Manhattan-world용)
+
+## 모듈 C: 2D → 3D 통합
+
+### C-1. 전철주 (point 객체)
+
+| 단계 | 도구 | 비고 |
+|---|---|---|
+| 다중뷰 cross-view association | **DBSCAN in world XY** | 핵심 트릭 |
+| Triangulation 초기화 | `pycolmap.triangulate_point` | https://github.com/colmap/pycolmap |
+| Outlier 제거 | RANSAC + Huber loss | |
+| Pose-fixed bundle adjustment | `pyceres` | https://github.com/cvg/pyceres |
+
+**Cross-view association 핵심 (어려운 부분)**: 전철주는 약 50m 간격으로 모양이 똑같아서 DINOv2 같은 appearance descriptor로는 구분 불가. → **기하 기반 군집화**:
+1. 각 뷰의 2D base detection을 ray로 back-project
+2. SfM dense cloud의 **로컬 ground plane**과 ray 교차 → 월드좌표 후보점
+3. **DBSCAN(eps≈0.5m, min_samples=3)** 으로 월드 XY 클러스터링 → 한 클러스터 = 한 물리적 전철주
+4. 클러스터 내 detection들로 pycolmap triangulate
+
+### C-2. 레일 (curve 객체)
+
+| 단계 | 도구 |
+|---|---|
+| 지면 추출 | **PDAL CSF (Cloth Simulation Filter)** https://pdal.io/en/latest/stages/filters.csf.html |
+| Ray-ground 교차 | **Open3D `RaycastingScene`** |
+| 곡선 fitting | `scipy.interpolate.splprep` (3D B-spline) |
+| NURBS 정밀화 | `geomdl` https://github.com/orbingol/NURBS-Python |
+| Bundle adjustment | `pyceres` (control points 변수화) |
+
+**파이프라인**:
+1. SfM dense cloud → CSF로 ground 분류 → 로컬 표면 메시
+2. 각 뷰의 2D rail polyline을 3–5px 간격으로 샘플링
+3. 카메라 ray가 ground 표면과 교차하는 월드좌표 모음
+4. 모든 뷰의 월드좌표를 모아 호장(arc length)으로 정렬 → 3D B-spline fit
+5. (선택) pyceres로 control point를 변수로 두고 모든 뷰의 2D polyline에 대한 reprojection 오차 최소화
+
+### C-3. SfM dense step 교체 여부 판단
+
+**결론: 교체하지 않는다.** Gaussian Splatting / NeRF는 photometric loss 기반이라 가는 물체 문제를 근본적으로 풀지 못함. 측정 정확도는 detection→triangulate가 더 강함. 단, **시각화 산출물**용으로만 2DGS(https://github.com/hbb1/2d-gaussian-splatting) 옵션으로 추가 가능.
+
+**SfM에서 활용**: 카메라 포즈 + intrinsics, sparse cloud (sanity check), **ground 분류된 dense cloud만** 사용. 메시는 버린다.
+
+## 전체 파이프라인 다이어그램
+
+```
+드론 영상 (oblique + nadir)
+        │
+        ▼
+[Stage 0] COLMAP / Metashape (기존)
+   → 카메라 [R|t], K, dense cloud
+        │
+   ┌────┼────────────────────────┐
+   ▼    ▼                        ▼
+[Stage 1a]   [Stage 1b]              [Stage 1c]
+PDAL CSF    Pole 4-kpt detection    Rail seg + polyline
+ground       (RTMPose)               (SegFormer + skel)
+        │              │                  │
+        └──────┬───────┘                  │
+               ▼                          │
+[Stage 2] Pole association                │
+  - ray → ground 교차                     │
+  - DBSCAN in world XY                    │
+        │                                 │
+        ▼                                 │
+[Stage 3] Pole triangulation              │
+  - pycolmap + Huber + pyceres BA         │
+  → 3D pole base point                    │
+                                          ▼
+                                 [Stage 4] Rail reconstruction
+                                  - Open3D raycasting (per sample)
+                                  - scipy splprep B-spline
+                                  - pyceres spline BA
+                                  → 3D rail centerline
+        │                                 │
+        └──────────────┬──────────────────┘
+                       ▼
+[Stage 5] Output: GeoJSON/Shapefile (EPSG:5186) + DXF (ezdxf)
+```
+
+## 권장 디렉터리 구조 (신규 생성할 파일들)
+
+이미 존재하는 파일은 `docs/research.md` 와 `README.md` 뿐. 새로 만들 핵심 파일:
+
+```
+detectelectronpole/
+├── docs/
+│   ├── research.md                  # 기존
+│   └── plan.md                      # 이 계획의 한국어 사본
+├── data/
+│   ├── images/                      # 드론 원본
+│   ├── sfm/                         # COLMAP DB / Metashape export
+│   └── labels/
+│       └── poles_4kpt.json          # COCO-keypoints 포맷, 4-keypoint 스키마
+├── configs/
+│   ├── rtmpose_pole_4kpt.py         # MMPose 설정
+│   └── segformer_rail.yaml          # SegFormer fine-tune 설정
+├── src/
+│   ├── detection/
+│   │   ├── mvp_groundingdino_sam2.py    # zero-label MVP (week 1)
+│   │   ├── pole_keypoint.py             # RTMPose 추론 wrapper
+│   │   ├── rail_segment.py              # SegFormer 추론 + skeletonize
+│   │   └── augment_copy_paste.py        # 키포인트 보존 copy-paste
+│   ├── triangulation/
+│   │   ├── sfm_io.py                # pycolmap / Metashape XML 로더
+│   │   ├── ground.py                # PDAL CSF + Open3D RaycastingScene
+│   │   ├── poles.py                 # DBSCAN association + pycolmap + pyceres
+│   │   └── rails.py                 # raycasting + splprep + pyceres
+│   ├── self_training/
+│   │   └── sfm_self_training.py     # multi-view pseudo-label loop
+│   └── export/
+│       └── geojson_dxf.py           # 산출물 export
+└── pipeline/
+    └── run_full.py                  # end-to-end orchestrator
+```
+
+## 단계별 마일스톤
+
+### Phase 1 (Week 1) — Zero-label MVP
+- [ ] `mvp_groundingdino_sam2.py`: Grounding-DINO 1.5 + SAM 2.1로 30장 이미지에서 pole mask + base 후보 추출
+- [ ] PCA 주축 lower endpoint를 30장의 GT와 비교해 픽셀 오차 baseline 측정
+- [ ] Grounded-SAM 2로 같은 방법으로 rail 마스크 baseline 측정
+- **목적**: 라벨 0개로 어디까지 가는지 정량화. 실패 모드 카탈로그 작성.
+
+### Phase 2 (Week 2–3) — Detection 학습
+- [ ] 4-keypoint COCO-keypoints 포맷으로 30장 재라벨링 (point 4개 클릭이라 비용 낮음)
+- [ ] copy-paste augmentation으로 ~300장으로 증강
+- [ ] RTMPose-m fine-tune (frozen backbone)
+- [ ] SegFormer-B2를 RailSem19 → UAV-RSOD → 30장 3-stage transfer
+- [ ] 두 모델 평가 (PCK, mIoU)
+
+### Phase 3 (Week 4) — 2D → 3D 통합
+- [ ] `sfm_io.py`로 Metashape/COLMAP 출력 로드
+- [ ] `ground.py`로 PDAL CSF 지면 추출 + Open3D RaycastingScene 구축
+- [ ] `poles.py`: DBSCAN association + pycolmap triangulate + pyceres BA → 3D pole points
+- [ ] `rails.py`: ray-ground 교차 + B-spline fit → 3D rail polylines
+- [ ] EPSG:5186 (한국 중부원점) GeoJSON export
+
+### Phase 4 (Week 5+) — Self-training 루프
+- [ ] `sfm_self_training.py`: 한 라운드 = 검출 → 삼각측량 → reprojection → pseudo-label → fine-tune
+- [ ] reprojection error 임계값 5px로 outlier 자동 제거
+- [ ] 라운드별 mAP/PCK 추적
+- [ ] 수렴까지 반복 (보통 3–5 라운드)
+
+## 검증 방법 (Verification)
+
+1. **모듈 A 단독**: 30장 GT 대비 base keypoint **PCK@5px** 측정. MVP baseline → RTMPose baseline → self-training 라운드별 비교.
+2. **모듈 B 단독**: 30장 GT 대비 rail mask **mIoU** 및 polyline **Hausdorff distance** 측정.
+3. **모듈 C 단독**: Phase 1의 raw 검출 결과를 `triangulation/poles.py`에 넣어 3D pole 좌표를 GeoJSON으로 export. 측량 기준점(GCP) 또는 KORAIL 인벤토리와 비교 (없으면 두 측량 비행을 따로 돌려 일관성 검증).
+4. **End-to-end**: 작은 trial section (예: 100m 구간, ~3 전철주 + 2 레일)에서 전체 파이프라인 1회 실행. 산출물 시각화: QGIS에서 GeoJSON + 정사영상 오버레이.
+5. **회귀 테스트**: `pipeline/run_full.py --section trial_100m` 단일 명령으로 재현 가능해야 함.
+
+## 위험 요소 및 대응
+
+| 위험 | 대응 |
+|---|---|
+| 전철주 base가 ballast/식생에 가려짐 | self-training 루프가 다른 뷰의 가시 base와 일관성 검증 → 평균 offset 보정 |
+| SfM ground cloud 밀도 부족 (<50 pts/m²) | 비행 고도/오버랩 재조정, 또는 SuGaR/2DGS로 보강 (시각화 한정) |
+| 한국 catenary 마스트 종류 다양 (단주 vs 문형) | 4-keypoint 스킴을 `pole_type` 클래스 분기 (단일 마스트 / 문형 가구)로 확장 |
+| 좌표계 (local vs EPSG:5186) | GCP 3점 이상으로 7-parameter Helmert 변환 |
+| 라벨 시드(30장)가 한 카메라 각도/조명 편향 | 첫 self-training 라운드에서 다양성 강제 (multi-view 클러스터당 최소 1장 샘플링) |
+
+## 핵심 참조 (전부 April 2026 기준 상용 가능)
+
+- **Pole detection**: MMPose RTMPose, SAM 2.1, Grounding-DINO 1.5/1.6, Pixel-Perfect SfM
+- **Rail detection**: SegFormer, RailSem19, UAV-RSOD, NL-LinkNet-SSR, DeepLSD
+- **3D 통합**: pycolmap, pyceres, PDAL, Open3D, scipy.interpolate, geomdl, ezdxf