- recorder v1 (fail) → v2 (pass): drag state machine, focus events, ts/raw_coord - player pass with caveats: reliability untestable in sandbox - PROGRESS.md Done rows + follow-ups for live SUT smoke test - PLAN.md P1 pivoted to test-runner + live smoke test Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
47
docs/history/2026-04-07_이슈6-7-P1-UI자동화-orchestration.md
Normal file
47
docs/history/2026-04-07_이슈6-7-P1-UI자동화-orchestration.md
Normal file
@@ -0,0 +1,47 @@
|
||||
# 2026-04-07 이슈 #6·#7 — P1 UI 자동화 (recorder/player) 오케스트레이션
|
||||
|
||||
- **이슈**: #6 (recorder), #7 (player)
|
||||
- **소요 시간**: ~40분 (서브에이전트 병렬 + recorder 1회 재작업)
|
||||
- **Context 사용량**: ~210k tokens (orchestrator 세션)
|
||||
|
||||
## 사이클
|
||||
|
||||
1. 이슈 #6, #7 생성 → Generator × 2 **병렬 백그라운드** (FlaUI 4.0.0, YamlDotNet 16.1.3, TFM net8.0-windows)
|
||||
2. 두 Generator 완료
|
||||
3. Evaluator × 2 **병렬 백그라운드**
|
||||
4. **recorder fail** (drag 미집성 / focus 미캡처 / ts·raw_coord 미직렬화) → Re-Generator → Re-Evaluator **pass**
|
||||
5. **player pass with caveats** (reliability untestable)
|
||||
6. PROGRESS/PLAN 갱신, 이슈 close, push
|
||||
|
||||
## 커밋
|
||||
|
||||
- `d486cbb` recorder v1
|
||||
- `f17e764` player v1
|
||||
- `56b7233` recorder v2 (drag state machine + focus events + ts/raw_coord)
|
||||
|
||||
## 결과
|
||||
|
||||
| 모듈 | 테스트 | 결과 |
|
||||
|------|--------|------|
|
||||
| recorder | 9/9 (5→9) | pass v2 (2 untestable) |
|
||||
| player | 6/6 | pass with caveats (1 untestable) |
|
||||
|
||||
## Harness design 재검증
|
||||
|
||||
- Recorder v1 Generator가 자진 flag한 "drag 미집성, IME 미구현" 중 **drag 문제를 Evaluator가 추가 2건(focus, ts/raw_coord)과 함께 fail 판정** — Generator 자기 flag 외에 놓친 것이 있었음을 실증
|
||||
- Re-iteration 1회로 수렴 (역대 2회 연속 성공 패턴)
|
||||
- 병렬 서브에이전트로 orchestrator 세션 컨텍스트를 ~210k로 유지
|
||||
|
||||
## Follow-ups (non-blocking)
|
||||
|
||||
전부 "라이브 SUT 실제 실행" 종류:
|
||||
- recorder DoD #1 (attach) / #7 (60 FPS)
|
||||
- player DoD #2 (wait_for 강화) / #7 (10/10 reliability)
|
||||
- player `UiaPlayerHost` full path resolver
|
||||
- recorder IME 지원
|
||||
|
||||
## 다음 단계
|
||||
|
||||
**test-runner** — 시나리오 일괄 실행 + normalizer + diff-reporter 파이프라인. 이로써 PoC 4개가 통합 E2E 경로를 형성.
|
||||
|
||||
이후 사용자 환경에서 **라이브 smoke test**를 수행해 reliability DoD 검증.
|
||||
46
docs/history/2026-04-07_이슈6-recorder-evaluator-v2.md
Normal file
46
docs/history/2026-04-07_이슈6-recorder-evaluator-v2.md
Normal file
@@ -0,0 +1,46 @@
|
||||
# 2026-04-07 이슈6 — Recorder Evaluator v2
|
||||
|
||||
- **Date:** 2026-04-07
|
||||
- **Issue:** #6 (recorder evaluation, iteration 2)
|
||||
- **Role:** Evaluator (independent)
|
||||
- **Generator commit under review:** `56b7233`
|
||||
- **Contract:** `docs/contracts/recorder.md`
|
||||
- **Previous evaluation:** `docs/contracts/recorder.evaluation.v1.md` (verdict: fail)
|
||||
|
||||
## What I did
|
||||
|
||||
1. Re-built the solution: `dotnet build recordingtest.sln` → 0 warnings, 0 errors.
|
||||
2. Re-ran the recorder test suite: `dotnet test tests/Recordingtest.Recorder.Tests` → **9 passed / 0 failed / 0 skipped**.
|
||||
3. Read the new/changed sources independently:
|
||||
- `src/Recordingtest.Recorder/DragCollapser.cs`
|
||||
- `src/Recordingtest.Recorder/Scenario.cs`
|
||||
- `src/Recordingtest.Recorder/ScenarioWriter.cs`
|
||||
- `src/Recordingtest.Recorder/Program.cs`
|
||||
- `tests/Recordingtest.Recorder.Tests/RecorderTests.cs`
|
||||
4. Cross-checked each v1 gap against the new code.
|
||||
5. Archived v1 evaluation as `docs/contracts/recorder.evaluation.v1.md` and wrote a fresh v2 evaluation at `docs/contracts/recorder.evaluation.md`.
|
||||
|
||||
## Findings
|
||||
|
||||
- **Drag collapse:** real state machine. Tracks `down`, accumulates `maxDistSq` over `move`s, then on `mouse_up_l` compares `max(maxDistSq, finalDistSq)` against `DragThresholdPx²` to choose `drag` or `click`. Threshold is constructor-configurable (`DragCollapser(int dragThresholdPx = 4)`), default 4 px.
|
||||
- **Focus capture:** `Program.cs` calls `automation.RegisterFocusChangedEvent(...)` (try/catch-guarded), builds an UIA path inside the callback via `ElementPathBuilder.Build`, and pushes a synthetic `focus_change` `RawEvent` carrying `FocusedElementPath` into the same `Channel`. `DragCollapser` translates `focus_change` into a `focus` `ScenarioStep`.
|
||||
- **`ts` / `raw_coord` persistence:** `ScenarioStep` gained `Ts`, `RawCoord`, `EndOffset`, `EndRawCoord`. `ScenarioWriter` uses `UnderscoredNamingConvention`, so they serialize as `ts:` / `raw_coord:` / `end_offset:` / `end_raw_coord:`. The roundtrip test asserts both substrings appear in the yaml and round-trip back to identical values.
|
||||
- **Tests:** 4 new tests verify drag-beyond-threshold, click-below-threshold, focus-change → focus step, and yaml roundtrip of `ts` + `raw_coord`. All meaningful (assertions match the contract event shape).
|
||||
- **Right-click, wheel, key paths** also populate `Ts` and `RawCoord` consistently in `DragCollapser`.
|
||||
|
||||
## Verdict
|
||||
|
||||
**pass.**
|
||||
|
||||
The three v1 code gaps (drag collapse, focus events, `ts`+`raw_coord` persistence) are all closed and covered by tests. The only non-`pass` cells in the v2 verdict table are DoD #1 live-attach and DoD #7 perf, which are structurally untestable inside this sandbox and explicitly deferred to a manual workstation smoke against EG-BIM Modeler.
|
||||
|
||||
## Artifacts
|
||||
|
||||
- `docs/contracts/recorder.evaluation.md` (v2, new)
|
||||
- `docs/contracts/recorder.evaluation.v1.md` (v1, archived)
|
||||
- `docs/history/2026-04-07_이슈6-recorder-evaluator-v2.md` (this file)
|
||||
|
||||
## Notes
|
||||
|
||||
- Did not modify any generator code.
|
||||
- Did not update `PROGRESS.md`.
|
||||
34
docs/history/2026-04-07_이슈6-recorder-evaluator.md
Normal file
34
docs/history/2026-04-07_이슈6-recorder-evaluator.md
Normal file
@@ -0,0 +1,34 @@
|
||||
# 2026-04-07 이슈 #6 — recorder Evaluator
|
||||
|
||||
- 이슈: #6 (recorder evaluation)
|
||||
- 역할: Evaluator (Generator와 독립)
|
||||
- Generator commit: `d486cbb`
|
||||
- 소요 시간: 약 8분
|
||||
- Context 사용량: 약 35k / 1M tokens
|
||||
- 모델: claude-opus-4-6 (1M)
|
||||
|
||||
## 수행
|
||||
|
||||
1. `dotnet build recordingtest.sln` 실행 → green (warning 0 / error 0).
|
||||
2. `dotnet test tests/Recordingtest.Recorder.Tests` 실행 → 5 passed / 0 failed.
|
||||
3. 소스 정독: `ElementPathBuilder.cs`, `OffsetNormalizer.cs`, `MaskPolicy.cs`, `Scenario.cs`, `ScenarioWriter.cs`, `LowLevelHook.cs`, `NativeMethods.cs`, `Program.cs`, `RecorderTests.cs`.
|
||||
4. DoD 항목별 verdict 표 작성, 릴리즈 게이트 명시.
|
||||
5. `docs/contracts/recorder.evaluation.md` 작성.
|
||||
|
||||
## 결과
|
||||
|
||||
- 종합 verdict: **fail (blocks release until manual SUT run + drag/focus 구현)**
|
||||
- pass: OffsetNormalizer 정규화·클램프, yaml 스키마 round-trip, PasswordBox 마스킹, 종료 요약 출력, CLI `--attach` 누락 처리.
|
||||
- partial: attach + hook (sandbox에서 SUT 미존재로 untestable), 이벤트 종류 (drag collapse 부재 + focus event 부재), event shape (`ts`/`raw_coord`가 step에 미보존).
|
||||
- untestable: 60 FPS 영향.
|
||||
- 후속 권고: drag 상태머신 + UIA FocusChangedEventHandler 추가, step에 ts/raw_coord 보존, EG-BIM Modeler에 수동 attach 스모크 실행.
|
||||
|
||||
## 산출물
|
||||
|
||||
- `docs/contracts/recorder.evaluation.md`
|
||||
- `docs/history/2026-04-07_이슈6-recorder-evaluator.md`
|
||||
|
||||
## 주의
|
||||
|
||||
- Generator 코드 미수정.
|
||||
- `PROGRESS.md` 미수정.
|
||||
47
docs/history/2026-04-07_이슈7-player-evaluator.md
Normal file
47
docs/history/2026-04-07_이슈7-player-evaluator.md
Normal file
@@ -0,0 +1,47 @@
|
||||
# 2026-04-07 — Issue #7 player evaluator
|
||||
|
||||
**Role:** Evaluator (independent)
|
||||
**Target:** `player`
|
||||
**Generator commit:** f17e764
|
||||
**Verdict:** pass with caveats
|
||||
|
||||
## What I did
|
||||
|
||||
1. Built `recordingtest.sln` -> 0 warn / 0 err.
|
||||
2. Ran `dotnet test tests/Recordingtest.Player.Tests` -> 6/6 pass.
|
||||
3. Read all player sources: `PlayerEngine.cs`, `IPlayerHost.cs`, `Program.cs`, `ScenarioLoader.cs`, `Model/Scenario.cs`, `Model/Step.cs`, `UiaPlayerHost.cs` (skim).
|
||||
4. Grep `Thread.Sleep(` and `Task.Delay(TimeSpan.FromSeconds` in `PlayerEngine.cs` -> 0 matches.
|
||||
5. Verified `Player_NoFixedSleep` is real: uses `[CallerFilePath]` to locate `src/Recordingtest.Player/PlayerEngine.cs` and regex-asserts absence of fixed sleeps.
|
||||
6. Mapped each DoD bullet to evidence and produced verdict table in `docs/contracts/player.evaluation.md`.
|
||||
|
||||
## DoD scores
|
||||
|
||||
| Item | Score |
|
||||
|---|---|
|
||||
| CLI args | pass |
|
||||
| wait_for | partial (PoC passthrough) |
|
||||
| resolve + offset | pass |
|
||||
| failure artifacts | pass |
|
||||
| checkpoint save | pass |
|
||||
| exit codes | pass |
|
||||
| 10/10 reliability | untestable (sandbox) |
|
||||
| no fixed sleep | pass |
|
||||
|
||||
## Key findings
|
||||
|
||||
- `PlayerEngine.ComputeScreenPoint` formula matches expected `bounds.X + W*ox`, verified by test (125,210 from 100/200 + 50*0.5 / 40*0.25).
|
||||
- `Program.cs` only supports `--no-launch` attach mode; launch path returns exit 5 with explicit message — generator was honest.
|
||||
- `wait_for` hint is forwarded to `IPlayerHost.WaitFor` with timeout; engine throws on timeout. Real waiting strategy lives in `UiaPlayerHost` (PoC).
|
||||
- `Model` classes mirror recorder yaml schema; `UnderscoredNamingConvention` handles `uia_path`, `after_step`, `save_as`, `startup_timeout_ms`.
|
||||
- Reliability (10x replay) cannot be measured here — no real SUT GUI in sandbox. Deferred, not failed.
|
||||
|
||||
## Constraints respected
|
||||
|
||||
- Did NOT modify generator code.
|
||||
- Did NOT update PROGRESS.md.
|
||||
- Only wrote `docs/contracts/player.evaluation.md` and this history file.
|
||||
|
||||
## Artifacts
|
||||
|
||||
- `d:/MYCLAUDE_PROJECT/recordingtest/docs/contracts/player.evaluation.md`
|
||||
- `d:/MYCLAUDE_PROJECT/recordingtest/docs/history/2026-04-07_이슈7-player-evaluator.md`
|
||||
Reference in New Issue
Block a user