Files

minsung 417f880a87 Setup RailPose3D harness (Planner/Generator/Evaluator)

Name the project RailPose3D and stand up a multi-agent harness
following the Anthropic harness-design blog principles
(decomposition, separation of concerns, file-based handoff,
sprint contracts, context-reset over compaction).

- CLAUDE.md / PLAN.md / PROGRESS.md as the file-based handoff
  surface; every agent must read PLAN+PROGRESS before acting.
- 7 sub-agents under .claude/agents/: plan-architect (Planner),
  pole-detector-builder, rail-detector-builder, triangulation-
  builder, data-pipeline-builder (Generators), module-evaluator
  (Evaluator), dataset-explorer (read-only helper).
- 6 skills under .claude/skills/: /start /sprint /eval /progress
  /handoff /contract.
- SessionStart and Stop hooks to inject the PLAN/PROGRESS
  briefing and remind about PROGRESS.md updates.
- docs/plan.md captures the user-approved detailed plan;
  docs/research.md is the prior tech survey.
- .gitignore excludes data/, .usage/, model checkpoints, and
  local Claude overrides.

Tracking: closes #1

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-04-28 08:32:05 +09:00

15 KiB

Raw Blame History

드론 영상 기반 전철주·레일 검출 → 3D 좌표 추출 파이프라인 계획

Context (배경 및 문제 정의)

사용자 목표: 드론 정사영상(orthomosaic)과 드론 동영상/사진으로부터 전철주(catenary pole) 와 레일(rail) 같은 가늘고 긴 구조물을 검출하고, 그 3D 좌표를 추출하는 것.

이미 갖추어진 것:

SfM/Photogrammetry 파이프라인 (COLMAP/Metashape)으로 카메라 포즈와 sparse/dense cloud 생성 가능
라벨링 30장 (검증/시드용으로 활용)

현재 막혀있는 핵심 문제 4가지:

전철주 하단(base, 지면 접점) 좌표를 정확히 잡지 못함 — 3D 좌표화의 기준점인데, 카메라가 oblique(비스듬히)일 때 완금(crossarm, T자 가로팔)이 영상상 가장 아래쪽에 보이는 화소가 되어, bbox 기반 detector가 완금을 base로 오인.
라벨링 비용 — 30장만 보유. 일반적인 supervised 학습으론 불충분.
SfM 메시 품질 — 가는 수직물·금속 레일은 텍스처가 부족해 dense reconstruction이 너덜너덜하게 깨짐. 메시에서 직접 객체 추출 불가.
레일은 bbox로 표현 불가 — 화면을 가로지르는 긴 곡선이므로 segmentation/polyline이 자연스러운 표현.

핵심 전략

"메시에서 객체를 추출하지 말고, 2D 영상에서 정확히 검출한 뒤 SfM 카메라 포즈로 삼각측량하여 3D 좌표를 만든다." SfM의 강점(카메라 포즈)만 쓰고 약점(가는 물체 dense recon)은 우회한다.

세 모듈로 나누어 구축:

(A) 전철주 base 검출 — 4-keypoint pose estimation으로 구조적으로 base/완금 구분
(B) 레일 polyline 검출 — Segmentation + skeletonization (사전학습된 레일 모델에서 transfer)
(C) 2D → 3D 통합 — 다중뷰 삼각측량 + DBSCAN 군집 + bundle adjustment

전체 파이프라인은 이미 운영 중인 SfM 출력에 detection 모듈만 끼워넣는 형태라 최소 침습.

모듈 A: 전철주 base 검출

핵심 통찰

완금이 base로 오인되는 문제는 통계적 문제가 아니라 기하학적 문제다. Bounding box의 "가장 아래쪽 픽셀"은 oblique 시점에서 완금일 수 있으므로, 모델이 학습량과 무관하게 못 푼다. → Pose/keypoint 모델로 "이 점은 base, 저 점은 완금 끝"을 의미적으로 구분해야 풀린다.

채택 모델: RTMPose-m (or YOLO11-pose) + 4-keypoint 스켈레톤

스켈레톤: {base, top, left_crossarm_tip, right_crossarm_tip} (각 전철주당 4점)
이유: 사람 포즈 추정에서 "왼발/오른발"을 헷갈리지 않게 만든 것과 정확히 같은 트릭. base 채널은 오직 지면 접점에서만 supervise 되므로 완금과 구조적으로 구분됨.
Repo: https://github.com/open-mmlab/mmpose (RTMPose), https://github.com/ultralytics/ultralytics (YOLO11-pose)
fallback: ViTPose-Base (frozen DINOv2 backbone) — 30장 같은 극소 데이터에서 더 강함

라벨 부족(30장) 대응 — 4단계 부트스트랩

Day 1–3 (라벨 0개로 시작): Grounding-DINO 1.5 ("utility pole", "전봇대") → SAM 2.1 box prompt → 마스크의 PCA 주축의 아래쪽 끝 → base 후보. 30장에 대해 픽셀 오차 측정 = baseline.
- Repos: https://github.com/IDEA-Research/Grounding-DINO-1.5-API , https://github.com/facebookresearch/sam2
Day 4–5: MVP 출력 중 깨끗한 것을 RTMPose 학습용 부트스트랩 라벨로 사용.
Week 2+: Albumentations + copy-paste augmentation (SAM2 마스크로 전철주를 잘라 다양한 배경에 붙이기, keypoint도 함께 변환). 5–10배 데이터 증강.
Week 3+ (가장 큰 레버리지): SfM-consistent self-training 루프
- 한 뷰의 base detection을 SfM 포즈로 다른 뷰에 reprojection → pseudo-label
- reprojection error > 5px이면 outlier로 제거 (완금을 base로 잘못 잡으면 자동으로 걸러짐 — 다른 뷰의 정상 detection들과 3D상으로 일관되지 않으므로)
- 참고: EpipolarPose (CVPR'19), Pixel-Perfect SfM (ICCV'21) https://github.com/cvg/pixel-perfect-sfm

하지 말아야 할 것

bbox 기반 YOLO/Faster-RCNN으로 base 추정 (위 기하 문제로 실패)
BlenderProc 합성 데이터에만 의존 (sim-to-real gap)
30장으로 처음부터 큰 모델 full retrain

모듈 B: 레일 polyline 검출

채택 모델: SegFormer-B2 + 3단계 transfer learning + skeletonize

이유: 레일은 bbox 부적합. Segmentation은 nadir(평행선)/oblique(소실점 곡선) 모두 view-invariant하게 처리 가능. SegFormer-B2는 hierarchical attention으로 가늘고 긴 구조의 long-range continuity를 잘 잡으며, 라벨 효율이 좋음.
3단계 fine-tune: RailSem19 (8,500 시퀀스, 65% IoU 사전학습 가능) → UAV-RSOD (드론 시점 레일 데이터셋, 가장 직접적 도메인) → 사용자의 30장
Repos / Datasets:
- SegFormer: https://github.com/NVlabs/SegFormer (HF: https://huggingface.co/blog/fine-tune-segformer)
- RailSem19: https://wilddash.cc/railsem19
- UAV-RSOD: https://www.nature.com/articles/s41597-024-03952-3
- 대안 아키텍처: NL-LinkNet-SSR (Drones MDPI 2024, 드론 레일 전용 F1=0.749) https://www.mdpi.com/2504-446X/8/11/611

Polyline 추출 후처리

이진 마스크 (rail / not-rail)
skimage.morphology.skeletonize → 1px 중심선
Connected component split → 각 레일 분리 (instance ID)
Ramer-Douglas-Peucker 단순화 (eps ~1–2px) → 정렬된 polyline
선택적 sub-pixel refinement: DeepLSD attraction field로 skeleton을 sub-pixel로 snap (https://github.com/cvg/DeepLSD)

MVP (week 1)

Grounded-SAM 2 (텍스트 프롬프트 "railway track", "steel rail") → 30장 수작업 보정 → SegFormer fine-tune → 나머지 자동 라벨 → 반복(active learning loop).

하지 말아야 할 것

LaneATT/CLRNet 등 레인 검출 모델 직접 적용 (전방 차량 시점 가정 — drone nadir에 부적합)
Sat2Graph/RoadTracer 그래프 모델 (이 규모/감독 형태 아님)
HAWP/LETR/M-LSD wireframe 검출기 (실내 Manhattan-world용)

모듈 C: 2D → 3D 통합

C-1. 전철주 (point 객체)

단계	도구	비고
다중뷰 cross-view association	DBSCAN in world XY	핵심 트릭
Triangulation 초기화	`pycolmap.triangulate_point`	https://github.com/colmap/pycolmap
Outlier 제거	RANSAC + Huber loss
Pose-fixed bundle adjustment	`pyceres`	https://github.com/cvg/pyceres

Cross-view association 핵심 (어려운 부분): 전철주는 약 50m 간격으로 모양이 똑같아서 DINOv2 같은 appearance descriptor로는 구분 불가. → 기하 기반 군집화:

각 뷰의 2D base detection을 ray로 back-project
SfM dense cloud의 로컬 ground plane과 ray 교차 → 월드좌표 후보점
DBSCAN(eps≈0.5m, min_samples=3) 으로 월드 XY 클러스터링 → 한 클러스터 = 한 물리적 전철주
클러스터 내 detection들로 pycolmap triangulate

C-2. 레일 (curve 객체)

단계	도구
지면 추출	PDAL CSF (Cloth Simulation Filter) https://pdal.io/en/latest/stages/filters.csf.html
Ray-ground 교차	Open3D `RaycastingScene`
곡선 fitting	`scipy.interpolate.splprep` (3D B-spline)
NURBS 정밀화	`geomdl` https://github.com/orbingol/NURBS-Python
Bundle adjustment	`pyceres` (control points 변수화)

파이프라인:

SfM dense cloud → CSF로 ground 분류 → 로컬 표면 메시
각 뷰의 2D rail polyline을 3–5px 간격으로 샘플링
카메라 ray가 ground 표면과 교차하는 월드좌표 모음
모든 뷰의 월드좌표를 모아 호장(arc length)으로 정렬 → 3D B-spline fit
(선택) pyceres로 control point를 변수로 두고 모든 뷰의 2D polyline에 대한 reprojection 오차 최소화

C-3. SfM dense step 교체 여부 판단

결론: 교체하지 않는다. Gaussian Splatting / NeRF는 photometric loss 기반이라 가는 물체 문제를 근본적으로 풀지 못함. 측정 정확도는 detection→triangulate가 더 강함. 단, 시각화 산출물용으로만 2DGS(https://github.com/hbb1/2d-gaussian-splatting) 옵션으로 추가 가능.

SfM에서 활용: 카메라 포즈 + intrinsics, sparse cloud (sanity check), ground 분류된 dense cloud만 사용. 메시는 버린다.

전체 파이프라인 다이어그램

드론 영상 (oblique + nadir)
        │
        ▼
[Stage 0] COLMAP / Metashape (기존)
   → 카메라 [R|t], K, dense cloud
        │
   ┌────┼────────────────────────┐
   ▼    ▼                        ▼
[Stage 1a]   [Stage 1b]              [Stage 1c]
PDAL CSF    Pole 4-kpt detection    Rail seg + polyline
ground       (RTMPose)               (SegFormer + skel)
        │              │                  │
        └──────┬───────┘                  │
               ▼                          │
[Stage 2] Pole association                │
  - ray → ground 교차                     │
  - DBSCAN in world XY                    │
        │                                 │
        ▼                                 │
[Stage 3] Pole triangulation              │
  - pycolmap + Huber + pyceres BA         │
  → 3D pole base point                    │
                                          ▼
                                 [Stage 4] Rail reconstruction
                                  - Open3D raycasting (per sample)
                                  - scipy splprep B-spline
                                  - pyceres spline BA
                                  → 3D rail centerline
        │                                 │
        └──────────────┬──────────────────┘
                       ▼
[Stage 5] Output: GeoJSON/Shapefile (EPSG:5186) + DXF (ezdxf)

권장 디렉터리 구조 (신규 생성할 파일들)

이미 존재하는 파일은 docs/research.md 와 README.md 뿐. 새로 만들 핵심 파일:

detectelectronpole/
├── docs/
│   ├── research.md                  # 기존
│   └── plan.md                      # 이 계획의 한국어 사본
├── data/
│   ├── images/                      # 드론 원본
│   ├── sfm/                         # COLMAP DB / Metashape export
│   └── labels/
│       └── poles_4kpt.json          # COCO-keypoints 포맷, 4-keypoint 스키마
├── configs/
│   ├── rtmpose_pole_4kpt.py         # MMPose 설정
│   └── segformer_rail.yaml          # SegFormer fine-tune 설정
├── src/
│   ├── detection/
│   │   ├── mvp_groundingdino_sam2.py    # zero-label MVP (week 1)
│   │   ├── pole_keypoint.py             # RTMPose 추론 wrapper
│   │   ├── rail_segment.py              # SegFormer 추론 + skeletonize
│   │   └── augment_copy_paste.py        # 키포인트 보존 copy-paste
│   ├── triangulation/
│   │   ├── sfm_io.py                # pycolmap / Metashape XML 로더
│   │   ├── ground.py                # PDAL CSF + Open3D RaycastingScene
│   │   ├── poles.py                 # DBSCAN association + pycolmap + pyceres
│   │   └── rails.py                 # raycasting + splprep + pyceres
│   ├── self_training/
│   │   └── sfm_self_training.py     # multi-view pseudo-label loop
│   └── export/
│       └── geojson_dxf.py           # 산출물 export
└── pipeline/
    └── run_full.py                  # end-to-end orchestrator

단계별 마일스톤

Phase 1 (Week 1) — Zero-label MVP

mvp_groundingdino_sam2.py: Grounding-DINO 1.5 + SAM 2.1로 30장 이미지에서 pole mask + base 후보 추출
PCA 주축 lower endpoint를 30장의 GT와 비교해 픽셀 오차 baseline 측정
Grounded-SAM 2로 같은 방법으로 rail 마스크 baseline 측정
목적: 라벨 0개로 어디까지 가는지 정량화. 실패 모드 카탈로그 작성.

Phase 2 (Week 2–3) — Detection 학습

4-keypoint COCO-keypoints 포맷으로 30장 재라벨링 (point 4개 클릭이라 비용 낮음)
copy-paste augmentation으로 ~300장으로 증강
RTMPose-m fine-tune (frozen backbone)
SegFormer-B2를 RailSem19 → UAV-RSOD → 30장 3-stage transfer
두 모델 평가 (PCK, mIoU)

Phase 3 (Week 4) — 2D → 3D 통합

sfm_io.py로 Metashape/COLMAP 출력 로드
ground.py로 PDAL CSF 지면 추출 + Open3D RaycastingScene 구축
poles.py: DBSCAN association + pycolmap triangulate + pyceres BA → 3D pole points
rails.py: ray-ground 교차 + B-spline fit → 3D rail polylines
EPSG:5186 (한국 중부원점) GeoJSON export

Phase 4 (Week 5+) — Self-training 루프

sfm_self_training.py: 한 라운드 = 검출 → 삼각측량 → reprojection → pseudo-label → fine-tune
reprojection error 임계값 5px로 outlier 자동 제거
라운드별 mAP/PCK 추적
수렴까지 반복 (보통 3–5 라운드)

검증 방법 (Verification)

모듈 A 단독: 30장 GT 대비 base keypoint PCK@5px 측정. MVP baseline → RTMPose baseline → self-training 라운드별 비교.
모듈 B 단독: 30장 GT 대비 rail mask mIoU 및 polyline Hausdorff distance 측정.
모듈 C 단독: Phase 1의 raw 검출 결과를 triangulation/poles.py에 넣어 3D pole 좌표를 GeoJSON으로 export. 측량 기준점(GCP) 또는 KORAIL 인벤토리와 비교 (없으면 두 측량 비행을 따로 돌려 일관성 검증).
End-to-end: 작은 trial section (예: 100m 구간, ~3 전철주 + 2 레일)에서 전체 파이프라인 1회 실행. 산출물 시각화: QGIS에서 GeoJSON + 정사영상 오버레이.
회귀 테스트: pipeline/run_full.py --section trial_100m 단일 명령으로 재현 가능해야 함.

위험 요소 및 대응

위험	대응
전철주 base가 ballast/식생에 가려짐	self-training 루프가 다른 뷰의 가시 base와 일관성 검증 → 평균 offset 보정
SfM ground cloud 밀도 부족 (<50 pts/m²)	비행 고도/오버랩 재조정, 또는 SuGaR/2DGS로 보강 (시각화 한정)
한국 catenary 마스트 종류 다양 (단주 vs 문형)	4-keypoint 스킴을 `pole_type` 클래스 분기 (단일 마스트 / 문형 가구)로 확장
좌표계 (local vs EPSG:5186)	GCP 3점 이상으로 7-parameter Helmert 변환
라벨 시드(30장)가 한 카메라 각도/조명 편향	첫 self-training 라운드에서 다양성 강제 (multi-view 클러스터당 최소 1장 샘플링)

핵심 참조 (전부 April 2026 기준 상용 가능)

Pole detection: MMPose RTMPose, SAM 2.1, Grounding-DINO 1.5/1.6, Pixel-Perfect SfM
Rail detection: SegFormer, RailSem19, UAV-RSOD, NL-LinkNet-SSR, DeepLSD
3D 통합: pycolmap, pyceres, PDAL, Open3D, scipy.interpolate, geomdl, ezdxf

15 KiB Raw Blame History Unescape Escape

드론 영상 기반 전철주·레일 검출 → 3D 좌표 추출 파이프라인 계획

Context (배경 및 문제 정의)

핵심 전략

모듈 A: 전철주 base 검출

핵심 통찰

채택 모델: RTMPose-m (or YOLO11-pose) + 4-keypoint 스켈레톤

라벨 부족(30장) 대응 — 4단계 부트스트랩

하지 말아야 할 것

모듈 B: 레일 polyline 검출

채택 모델: SegFormer-B2 + 3단계 transfer learning + skeletonize

Polyline 추출 후처리

MVP (week 1)

하지 말아야 할 것

모듈 C: 2D → 3D 통합

C-1. 전철주 (point 객체)

C-2. 레일 (curve 객체)

C-3. SfM dense step 교체 여부 판단

전체 파이프라인 다이어그램

권장 디렉터리 구조 (신규 생성할 파일들)

단계별 마일스톤

Phase 1 (Week 1) — Zero-label MVP

Phase 2 (Week 2–3) — Detection 학습

Phase 3 (Week 4) — 2D → 3D 통합

Phase 4 (Week 5+) — Self-training 루프

검증 방법 (Verification)

위험 요소 및 대응

핵심 참조 (전부 April 2026 기준 상용 가능)

15 KiB

Raw Blame History