From 13dc4109d804cf280d0d51e76561831baf9a1770 Mon Sep 17 00:00:00 2001
From: minsung <minsung.kim.hanmaceng@gmail.com>
Date: Tue, 7 Apr 2026 15:23:46 +0900
Subject: [PATCH] Orchestrate test-runner PoC evaluation (#8)

- 5-module E2E integration runner, 6 tests, all DoD pass
- PROGRESS.md Done row, PLAN.md pivoted to live smoke test

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---
 PLAN.md                                       | 12 ++---
 PROGRESS.md                                   |  1 +
 docs/contracts/test-runner.evaluation.md      | 43 +++++++++++++++
 docs/contracts/test-runner.md                 | 54 +++++++++++++++++++
 .../2026-04-07_이슈8-test-runner-evaluator.md | 17 ++++++
 ...6-04-07_이슈8-test-runner-orchestration.md | 30 +++++++++++
 6 files changed, 150 insertions(+), 7 deletions(-)
 create mode 100644 docs/contracts/test-runner.evaluation.md
 create mode 100644 docs/contracts/test-runner.md
 create mode 100644 docs/history/2026-04-07_이슈8-test-runner-evaluator.md
 create mode 100644 docs/history/2026-04-07_이슈8-test-runner-orchestration.md
diff --git a/PLAN.md b/PLAN.md
index bc1c201..3b6d664 100644
--- a/PLAN.md
+++ b/PLAN.md
@@ -8,14 +8,12 @@
 1. **훅 동작 검증** — SessionStart/Stop/Guard 3개 shell 스크립트를 실제로 트리거시켜 확인
    - 의존: jq 설치 여부 확인
 
-## P1 — 통합 & 러너
+## P1 — 라이브 검증
 
-4. **test-runner** — 시나리오 일괄 실행 + normalizer + diff-reporter 파이프라인
-   - 의존: recorder/player/normalizer/diff-reporter 전부 pass (완료)
-   - Sprint Contract 먼저 작성 필요
-5. **라이브 SUT smoke test** — 수동 단계로 recorder attach → Box 생성 시나리오 → player 재생 → normalizer → diff
-   - 의존: test-runner PoC 선행 권장
-6. **engine-bridge 탐색** — HmEG PDB 리플렉션 스파이크
+4. **라이브 SUT smoke test** — 사용자 환경에서 recorder/player/runner 실제 검증 (E2E)
+   - 의존: 없음 (test-runner까지 PoC 완료)
+   - 가이드: `docs/guides/smoke-test.md` (작성 필요)
+5. **engine-bridge 탐색** — HmEG PDB 리플렉션 스파이크
    - 의존: 없음
 
 ## Follow-ups (non-blocking)
diff --git a/PROGRESS.md b/PROGRESS.md
index 34f55d2..0162c2a 100644
--- a/PROGRESS.md
+++ b/PROGRESS.md
@@ -31,6 +31,7 @@
 | 2026-04-07 | normalizer PoC + Evaluator pass v2 (#4) — sidecar log, explicit coverage mapping, 6 rules | `src/Recordingtest.Normalizer/`, `docs/contracts/normalizer.evaluation.md` |
 | 2026-04-07 | player PoC + Evaluator pass (#7) — 6 tests, no fixed sleeps, fake host | `src/Recordingtest.Player/`, `docs/contracts/player.evaluation.md` |
 | 2026-04-07 | recorder PoC + Evaluator pass v2 (#6) — drag state machine, focus events, ts/raw_coord | `src/Recordingtest.Recorder/`, `docs/contracts/recorder.evaluation.md` |
+| 2026-04-07 | test-runner PoC + Evaluator pass (#8) — 5-module E2E 파이프라인, 6 tests, DI | `src/Recordingtest.Runner/`, `docs/contracts/test-runner.evaluation.md` |
 
 ## In progress
 
diff --git a/docs/contracts/test-runner.evaluation.md b/docs/contracts/test-runner.evaluation.md
new file mode 100644
index 0000000..fb24942
--- /dev/null
+++ b/docs/contracts/test-runner.evaluation.md
@@ -0,0 +1,43 @@
+# test-runner Evaluation (Issue #8)
+
+- Generator commit: `96df2ef`
+- Evaluator: independent verification per contract `docs/contracts/test-runner.md`
+- Build: `dotnet build recordingtest.sln` -> 0 warnings, 0 errors
+- Tests: `dotnet test tests/Recordingtest.Runner.Tests` -> 6 passed / 0 failed / 0 skipped
+
+## Verdict: PASS
+
+## DoD verification
+
+| # | DoD item | Result | Evidence |
+|---|----------|--------|----------|
+| 1 | Console exe with 5 flags `--scenarios/--baselines/--out/--profile/--no-launch` | pass | `src/Recordingtest.Runner/Program.cs` switch parses all 5; missing required -> exit 2 |
+| 2 | Scan `*.yaml` and write to `<out>/<scenario>/` | pass | `TestRunner.cs` L27-36 enumerates `*.yaml`, creates per-scenario `artifactDir` |
+| 3 | Order: player -> normalizer -> diff-reporter | pass | `TestRunner.cs` L50-52 (engine.Run), L103-104 (Normalize), L111 (Compare) |
+| 4 | Profile default `default`, overridable | pass | `RunnerOptions.Profile = "default"`; passed through to normalizer; `--profile` writes it |
+| 5 | `report.json` schema `{runAt,total,passed,failed,errored,scenarios:[{name,status,hunks,checkpointCount,artifactDir}]}` | pass | `RunReport.cs` matches; camelCase JSON; test 6 asserts every field |
+| 6 | `report.md` human summary with table + failure section | pass | `WriteMarkdownReport` builds table + Failures section |
+| 7 | Exit codes 0/1/2 | pass | `ToExitCode`: errored>0 -> 2, failed>0 -> 1, else 0; tests assert all three |
+| 8 | `IPlayerHost` DI via `IRunnerHostFactory` | pass | `Interfaces.cs`; `RunAll` takes factory + INormalizer + IDiffer; tests inject fakes |
+| 9 | xUnit tests >=5 covering 5 scenarios | pass | 6 tests, all required cases (identical, differs, throws, empty, profile spy, schema) |
+| 10 | `dotnet build` green, `dotnet test` all pass | pass | 0/0 build, 6/6 tests |
+| 11 | Fixed sleep 0 | pass | grep `Thread.Sleep(` and `Task.Delay(TimeSpan.FromSeconds` in `src/Recordingtest.Runner` -> 0 hits |
+
+## Baseline normalization policy
+Contract allows either pre-normalized or re-normalized baselines. `TestRunner.cs` L10-11 documents the choice: baselines are re-normalized with the same profile as received output (safe either way). Documented = pass.
+
+## Test quality (not stubs)
+- TwoScenarios_BothIdentical_ExitZero_AllPass: real scenario YAML, real PlayerEngine, real diff stub identical
+- OneScenarioDiffers: asserts hunks==1 and status=="fail"
+- PlayerThrows: uses click step + `throwOnClick` fake host -> errored>=1, exit 2
+- EmptyScenariosDir: total==0, exit 0
+- ProfileOverride: SpyNormalizer captures profiles list; asserts contains "strict", not "default"
+- ReportJson schema: parses report.json and asserts every contract field; checks report.md exists
+
+## Integration smoke
+Trusted via unit tests + source review (Runner is fully DI-testable; tests drive `TestRunner.RunAll` directly with real `PlayerEngine` + scenario YAML).
+
+## Artifacts
+- Source: `src/Recordingtest.Runner/{Program.cs,TestRunner.cs,Interfaces.cs,RunnerOptions.cs,RunReport.cs,DefaultAdapters.cs}`
+- Tests: `tests/Recordingtest.Runner.Tests/{TestRunnerTests.cs,Fakes.cs}`
+- Contract: `docs/contracts/test-runner.md`
diff --git a/docs/contracts/test-runner.md b/docs/contracts/test-runner.md
new file mode 100644
index 0000000..acea498
--- /dev/null
+++ b/docs/contracts/test-runner.md
@@ -0,0 +1,54 @@
+# Sprint Contract — test-runner
+
+**Owner:** Generator
+**Depends on:** sut-prober, normalizer, player, diff-reporter (all pass)
+**Issue:** #8
+
+## Goal
+
+5개 PoC 모듈을 엮어 **시나리오 일괄 회귀 파이프라인**을 제공한다. 한 번의 CLI 호출로: 시나리오 디렉터리 스캔 → player로 각 시나리오 재생 → 결과 저장 파일을 normalizer 적용 → baseline과 diff-reporter로 비교 → 종합 리포트 생성. player/SUT 상호작용은 fake host로 교체 가능해야 단위 테스트가 라이브 SUT 없이 통과한다.
+
+## Definition of Done
+
+- [ ] `Recordingtest.Runner` 콘솔 exe — `--scenarios <dir> --baselines <dir> --out <dir> [--profile <name>] [--no-launch]`
+- [ ] 시나리오 디렉터리의 모든 `*.yaml` 을 로드 → 각각 실행 → `<out>/<scenario>/` 하위에 산출물 저장
+- [ ] 각 시나리오 실행 순서: player → normalizer(결과 파일) → diff-reporter(vs baseline)
+- [ ] 정규화 프로파일 기본 `default`, `--profile`로 오버라이드 가능
+- [ ] `<out>/report.json` 집계 리포트 스키마: `{ runAt, total, passed, failed, errored, scenarios: [{ name, status, hunks, checkpointCount, artifactDir }] }`
+- [ ] `<out>/report.md` 사람용 요약 (pass/fail 표 + 실패 시나리오당 diff 링크)
+- [ ] Exit code: 0 = all pass, 1 = any fail, 2 = any error
+- [ ] `IPlayerHost`를 DI로 주입 가능하게 하여 fake host로 단위 테스트 실행
+- [ ] xUnit 테스트 ≥ 5:
+  - 2개 시나리오(모두 identical) → `all pass`, exit 0
+  - 1개 시나리오가 baseline과 다름 → `fail`, exit 1, report.json 해당 항목에 hunks ≥ 1
+  - 1개 시나리오 player에서 예외 → `error`, exit 2, artifactDir 생성
+  - 빈 시나리오 디렉터리 → exit 0 (total=0)
+  - `--profile` 변경 시 normalizer가 해당 프로파일로 호출됨 확인 (spy)
+- [ ] `dotnet build` green, `dotnet test` all pass
+- [ ] 고정 sleep 0건 (player 원칙 상속)
+
+## Interfaces
+
+- **Inputs:** 시나리오 디렉터리, baseline 디렉터리, 출력 디렉터리, 정규화 프로파일
+- **Outputs:** `<out>/<scenario>/` (체크포인트, 아티팩트), `<out>/report.json`, `<out>/report.md`
+- **Side effects:** player host의 입력 전달 (fake 또는 UIA)
+
+## Out of scope
+
+- 실제 SUT 실행 안정성 (recorder/player의 몫)
+- 병렬 시나리오 실행 (v2)
+- CI 통합 (별도 작업)
+
+## Evaluation plan
+
+1. `dotnet build` + `dotnet test tests/Recordingtest.Runner.Tests` — count passed/failed
+2. xUnit 테스트 5개 모두 pass 확인
+3. `report.json` 스키마 검증 (테스트 안에서)
+4. `grep Thread.Sleep\\| Task.Delay` in Runner source → 0건
+5. DI 관점: `IPlayerHost` 생성자 주입 가능 확인
+6. CLI exit code 확인 (0/1/2)
+
+## Risks
+
+- normalizer의 파일 경로 vs 문자열 API — runner가 바이트 스트림으로 전달할지 파일로 flush 할지 결정 필요
+- 체크포인트 처리 복잡도 — v1은 마지막 저장 파일만 diff, 체크포인트 diff는 v2로 연기 가능
diff --git a/docs/history/2026-04-07_이슈8-test-runner-evaluator.md b/docs/history/2026-04-07_이슈8-test-runner-evaluator.md
new file mode 100644
index 0000000..ad08461
--- /dev/null
+++ b/docs/history/2026-04-07_이슈8-test-runner-evaluator.md
@@ -0,0 +1,17 @@
+# 2026-04-07 이슈 #8 test-runner Evaluator
+
+- 이슈: #8 test-runner
+- 역할: Evaluator (Generator 커밋 `96df2ef` 독립 검증)
+- 소요 시간: 약 5분
+- Context 사용량: 약 25k 토큰 (단일 패스, 병렬 Read 활용)
+
+## 작업 요약
+- `dotnet build recordingtest.sln`: 0 warning, 0 error
+- `dotnet test tests/Recordingtest.Runner.Tests`: 6/6 통과
+- `Thread.Sleep(` / `Task.Delay(TimeSpan.FromSeconds` grep: 0건
+- DoD 11개 항목 모두 pass — `RunnerOptions` 5필드, `IRunnerHostFactory`/`INormalizer`/`IDiffer` DI 가능, `RunAll`이 player→normalizer→differ 순서, `RunReport` 스키마 일치, `Program.cs` 5플래그/exit 0·1·2 정상, baseline 정규화 정책이 `TestRunner.cs` 주석에 문서화됨
+- 6개 테스트 모두 의미 있음(스텁 아님): identical/differs/throws/empty/profile-spy/schema 케이스 검증
+
+## 결과
+- 판정: **PASS**
+- 산출물: `docs/contracts/test-runner.evaluation.md`
diff --git a/docs/history/2026-04-07_이슈8-test-runner-orchestration.md b/docs/history/2026-04-07_이슈8-test-runner-orchestration.md
new file mode 100644
index 0000000..acb3178
--- /dev/null
+++ b/docs/history/2026-04-07_이슈8-test-runner-orchestration.md
@@ -0,0 +1,30 @@
+# 2026-04-07 이슈 #8 — test-runner PoC 오케스트레이션
+
+- **이슈**: #8 (test-runner)
+- **소요 시간**: ~15분 (1회 사이클)
+- **Context 사용량**: ~240k tokens (orchestrator 누적)
+
+## 사이클
+
+1. Planner 역할로 `docs/contracts/test-runner.md` 작성
+2. 이슈 #8 생성
+3. Generator 백그라운드 실행 → commit `96df2ef` (6/6 tests, 0 sleeps)
+4. Evaluator 백그라운드 실행 → **pass** (11/11 DoD, 재작업 0)
+5. PROGRESS/PLAN 갱신, 이슈 #8 close
+
+## 결과
+
+5개 모듈(sut-prober/normalizer/player/diff-reporter/test-runner)이 E2E 회귀 파이프라인으로 결합. xUnit 누적 **36개** 테스트 전부 green.
+
+| 모듈 | 커밋 |
+|------|------|
+| test-runner | `96df2ef` |
+
+## 비용
+
+Generator ~66k + Evaluator ~31k + Orchestrator ~15k = **~112k**
+
+## 다음 단계
+
+- **라이브 SUT smoke test** — 사용자 환경에서 recorder attach → 수동 시나리오 → player → runner 전체 경로 검증. 샌드박스에서 불가.
+- engine-bridge 탐색 (HmEG 리플렉션 스파이크)