Orchestrate P1 UI automation evaluations (#6, #7)

- recorder v1 (fail) → v2 (pass): drag state machine, focus events, ts/raw_coord
- player pass with caveats: reliability untestable in sandbox
- PROGRESS.md Done rows + follow-ups for live SUT smoke test
- PLAN.md P1 pivoted to test-runner + live smoke test

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
minsung
2026-04-07 14:37:14 +09:00
parent 56b7233500
commit 836afea5ee
9 changed files with 323 additions and 5 deletions

View File

@@ -0,0 +1,47 @@
# 2026-04-07 이슈 #6·#7 — P1 UI 자동화 (recorder/player) 오케스트레이션
- **이슈**: #6 (recorder), #7 (player)
- **소요 시간**: ~40분 (서브에이전트 병렬 + recorder 1회 재작업)
- **Context 사용량**: ~210k tokens (orchestrator 세션)
## 사이클
1. 이슈 #6, #7 생성 → Generator × 2 **병렬 백그라운드** (FlaUI 4.0.0, YamlDotNet 16.1.3, TFM net8.0-windows)
2. 두 Generator 완료
3. Evaluator × 2 **병렬 백그라운드**
4. **recorder fail** (drag 미집성 / focus 미캡처 / ts·raw_coord 미직렬화) → Re-Generator → Re-Evaluator **pass**
5. **player pass with caveats** (reliability untestable)
6. PROGRESS/PLAN 갱신, 이슈 close, push
## 커밋
- `d486cbb` recorder v1
- `f17e764` player v1
- `56b7233` recorder v2 (drag state machine + focus events + ts/raw_coord)
## 결과
| 모듈 | 테스트 | 결과 |
|------|--------|------|
| recorder | 9/9 (5→9) | pass v2 (2 untestable) |
| player | 6/6 | pass with caveats (1 untestable) |
## Harness design 재검증
- Recorder v1 Generator가 자진 flag한 "drag 미집성, IME 미구현" 중 **drag 문제를 Evaluator가 추가 2건(focus, ts/raw_coord)과 함께 fail 판정** — Generator 자기 flag 외에 놓친 것이 있었음을 실증
- Re-iteration 1회로 수렴 (역대 2회 연속 성공 패턴)
- 병렬 서브에이전트로 orchestrator 세션 컨텍스트를 ~210k로 유지
## Follow-ups (non-blocking)
전부 "라이브 SUT 실제 실행" 종류:
- recorder DoD #1 (attach) / #7 (60 FPS)
- player DoD #2 (wait_for 강화) / #7 (10/10 reliability)
- player `UiaPlayerHost` full path resolver
- recorder IME 지원
## 다음 단계
**test-runner** — 시나리오 일괄 실행 + normalizer + diff-reporter 파이프라인. 이로써 PoC 4개가 통합 E2E 경로를 형성.
이후 사용자 환경에서 **라이브 smoke test**를 수행해 reliability DoD 검증.

View File

@@ -0,0 +1,46 @@
# 2026-04-07 이슈6 — Recorder Evaluator v2
- **Date:** 2026-04-07
- **Issue:** #6 (recorder evaluation, iteration 2)
- **Role:** Evaluator (independent)
- **Generator commit under review:** `56b7233`
- **Contract:** `docs/contracts/recorder.md`
- **Previous evaluation:** `docs/contracts/recorder.evaluation.v1.md` (verdict: fail)
## What I did
1. Re-built the solution: `dotnet build recordingtest.sln` → 0 warnings, 0 errors.
2. Re-ran the recorder test suite: `dotnet test tests/Recordingtest.Recorder.Tests`**9 passed / 0 failed / 0 skipped**.
3. Read the new/changed sources independently:
- `src/Recordingtest.Recorder/DragCollapser.cs`
- `src/Recordingtest.Recorder/Scenario.cs`
- `src/Recordingtest.Recorder/ScenarioWriter.cs`
- `src/Recordingtest.Recorder/Program.cs`
- `tests/Recordingtest.Recorder.Tests/RecorderTests.cs`
4. Cross-checked each v1 gap against the new code.
5. Archived v1 evaluation as `docs/contracts/recorder.evaluation.v1.md` and wrote a fresh v2 evaluation at `docs/contracts/recorder.evaluation.md`.
## Findings
- **Drag collapse:** real state machine. Tracks `down`, accumulates `maxDistSq` over `move`s, then on `mouse_up_l` compares `max(maxDistSq, finalDistSq)` against `DragThresholdPx²` to choose `drag` or `click`. Threshold is constructor-configurable (`DragCollapser(int dragThresholdPx = 4)`), default 4 px.
- **Focus capture:** `Program.cs` calls `automation.RegisterFocusChangedEvent(...)` (try/catch-guarded), builds an UIA path inside the callback via `ElementPathBuilder.Build`, and pushes a synthetic `focus_change` `RawEvent` carrying `FocusedElementPath` into the same `Channel`. `DragCollapser` translates `focus_change` into a `focus` `ScenarioStep`.
- **`ts` / `raw_coord` persistence:** `ScenarioStep` gained `Ts`, `RawCoord`, `EndOffset`, `EndRawCoord`. `ScenarioWriter` uses `UnderscoredNamingConvention`, so they serialize as `ts:` / `raw_coord:` / `end_offset:` / `end_raw_coord:`. The roundtrip test asserts both substrings appear in the yaml and round-trip back to identical values.
- **Tests:** 4 new tests verify drag-beyond-threshold, click-below-threshold, focus-change → focus step, and yaml roundtrip of `ts` + `raw_coord`. All meaningful (assertions match the contract event shape).
- **Right-click, wheel, key paths** also populate `Ts` and `RawCoord` consistently in `DragCollapser`.
## Verdict
**pass.**
The three v1 code gaps (drag collapse, focus events, `ts`+`raw_coord` persistence) are all closed and covered by tests. The only non-`pass` cells in the v2 verdict table are DoD #1 live-attach and DoD #7 perf, which are structurally untestable inside this sandbox and explicitly deferred to a manual workstation smoke against EG-BIM Modeler.
## Artifacts
- `docs/contracts/recorder.evaluation.md` (v2, new)
- `docs/contracts/recorder.evaluation.v1.md` (v1, archived)
- `docs/history/2026-04-07_이슈6-recorder-evaluator-v2.md` (this file)
## Notes
- Did not modify any generator code.
- Did not update `PROGRESS.md`.

View File

@@ -0,0 +1,34 @@
# 2026-04-07 이슈 #6 — recorder Evaluator
- 이슈: #6 (recorder evaluation)
- 역할: Evaluator (Generator와 독립)
- Generator commit: `d486cbb`
- 소요 시간: 약 8분
- Context 사용량: 약 35k / 1M tokens
- 모델: claude-opus-4-6 (1M)
## 수행
1. `dotnet build recordingtest.sln` 실행 → green (warning 0 / error 0).
2. `dotnet test tests/Recordingtest.Recorder.Tests` 실행 → 5 passed / 0 failed.
3. 소스 정독: `ElementPathBuilder.cs`, `OffsetNormalizer.cs`, `MaskPolicy.cs`, `Scenario.cs`, `ScenarioWriter.cs`, `LowLevelHook.cs`, `NativeMethods.cs`, `Program.cs`, `RecorderTests.cs`.
4. DoD 항목별 verdict 표 작성, 릴리즈 게이트 명시.
5. `docs/contracts/recorder.evaluation.md` 작성.
## 결과
- 종합 verdict: **fail (blocks release until manual SUT run + drag/focus 구현)**
- pass: OffsetNormalizer 정규화·클램프, yaml 스키마 round-trip, PasswordBox 마스킹, 종료 요약 출력, CLI `--attach` 누락 처리.
- partial: attach + hook (sandbox에서 SUT 미존재로 untestable), 이벤트 종류 (drag collapse 부재 + focus event 부재), event shape (`ts`/`raw_coord`가 step에 미보존).
- untestable: 60 FPS 영향.
- 후속 권고: drag 상태머신 + UIA FocusChangedEventHandler 추가, step에 ts/raw_coord 보존, EG-BIM Modeler에 수동 attach 스모크 실행.
## 산출물
- `docs/contracts/recorder.evaluation.md`
- `docs/history/2026-04-07_이슈6-recorder-evaluator.md`
## 주의
- Generator 코드 미수정.
- `PROGRESS.md` 미수정.

View File

@@ -0,0 +1,47 @@
# 2026-04-07 — Issue #7 player evaluator
**Role:** Evaluator (independent)
**Target:** `player`
**Generator commit:** f17e764
**Verdict:** pass with caveats
## What I did
1. Built `recordingtest.sln` -> 0 warn / 0 err.
2. Ran `dotnet test tests/Recordingtest.Player.Tests` -> 6/6 pass.
3. Read all player sources: `PlayerEngine.cs`, `IPlayerHost.cs`, `Program.cs`, `ScenarioLoader.cs`, `Model/Scenario.cs`, `Model/Step.cs`, `UiaPlayerHost.cs` (skim).
4. Grep `Thread.Sleep(` and `Task.Delay(TimeSpan.FromSeconds` in `PlayerEngine.cs` -> 0 matches.
5. Verified `Player_NoFixedSleep` is real: uses `[CallerFilePath]` to locate `src/Recordingtest.Player/PlayerEngine.cs` and regex-asserts absence of fixed sleeps.
6. Mapped each DoD bullet to evidence and produced verdict table in `docs/contracts/player.evaluation.md`.
## DoD scores
| Item | Score |
|---|---|
| CLI args | pass |
| wait_for | partial (PoC passthrough) |
| resolve + offset | pass |
| failure artifacts | pass |
| checkpoint save | pass |
| exit codes | pass |
| 10/10 reliability | untestable (sandbox) |
| no fixed sleep | pass |
## Key findings
- `PlayerEngine.ComputeScreenPoint` formula matches expected `bounds.X + W*ox`, verified by test (125,210 from 100/200 + 50*0.5 / 40*0.25).
- `Program.cs` only supports `--no-launch` attach mode; launch path returns exit 5 with explicit message — generator was honest.
- `wait_for` hint is forwarded to `IPlayerHost.WaitFor` with timeout; engine throws on timeout. Real waiting strategy lives in `UiaPlayerHost` (PoC).
- `Model` classes mirror recorder yaml schema; `UnderscoredNamingConvention` handles `uia_path`, `after_step`, `save_as`, `startup_timeout_ms`.
- Reliability (10x replay) cannot be measured here — no real SUT GUI in sandbox. Deferred, not failed.
## Constraints respected
- Did NOT modify generator code.
- Did NOT update PROGRESS.md.
- Only wrote `docs/contracts/player.evaluation.md` and this history file.
## Artifacts
- `d:/MYCLAUDE_PROJECT/recordingtest/docs/contracts/player.evaluation.md`
- `d:/MYCLAUDE_PROJECT/recordingtest/docs/history/2026-04-07_이슈7-player-evaluator.md`