diff --git a/PLAN.md b/PLAN.md index 00a95d3..bc1c201 100644 --- a/PLAN.md +++ b/PLAN.md @@ -8,12 +8,15 @@ 1. **훅 동작 검증** — SessionStart/Stop/Guard 3개 shell 스크립트를 실제로 트리거시켜 확인 - 의존: jq 설치 여부 확인 -## P1 — UI 자동화 의존 +## P1 — 통합 & 러너 -4. **recorder PoC (element-aware)** — Sprint Contract: [docs/contracts/recorder.md](docs/contracts/recorder.md) - - 의존: FlaUI 패키지 승인 (사용자 확인 필요) -5. **player PoC** — Sprint Contract: [docs/contracts/player.md](docs/contracts/player.md) - - 의존: recorder 산출물 포맷 확정 +4. **test-runner** — 시나리오 일괄 실행 + normalizer + diff-reporter 파이프라인 + - 의존: recorder/player/normalizer/diff-reporter 전부 pass (완료) + - Sprint Contract 먼저 작성 필요 +5. **라이브 SUT smoke test** — 수동 단계로 recorder attach → Box 생성 시나리오 → player 재생 → normalizer → diff + - 의존: test-runner PoC 선행 권장 +6. **engine-bridge 탐색** — HmEG PDB 리플렉션 스파이크 + - 의존: 없음 ## Follow-ups (non-blocking) diff --git a/PROGRESS.md b/PROGRESS.md index 5663fcd..34f55d2 100644 --- a/PROGRESS.md +++ b/PROGRESS.md @@ -29,6 +29,8 @@ | 2026-04-07 | sut-prober PoC Evaluator pass (#3) | `docs/contracts/sut-prober.evaluation.md` | | 2026-04-07 | diff-reporter PoC + Evaluator pass (#5) | `src/Recordingtest.DiffReporter*/`, `docs/contracts/diff-reporter.evaluation.md` | | 2026-04-07 | normalizer PoC + Evaluator pass v2 (#4) — sidecar log, explicit coverage mapping, 6 rules | `src/Recordingtest.Normalizer/`, `docs/contracts/normalizer.evaluation.md` | +| 2026-04-07 | player PoC + Evaluator pass (#7) — 6 tests, no fixed sleeps, fake host | `src/Recordingtest.Player/`, `docs/contracts/player.evaluation.md` | +| 2026-04-07 | recorder PoC + Evaluator pass v2 (#6) — drag state machine, focus events, ts/raw_coord | `src/Recordingtest.Recorder/`, `docs/contracts/recorder.evaluation.md` | ## In progress @@ -40,6 +42,10 @@ _(없음)_ - [ ] diff-reporter: 실제 `diff-triager` 에이전트 통합 테스트 (현재 schema 단위 테스트로 대체, DoD #8 partial). non-blocking. - [ ] normalizer: `mask_volatile_settings` 규칙을 JSON-path 스코핑으로 제한 (현재는 필드명 전역 매칭). non-blocking risk. - [ ] normalizer: float epsilon 구성화 (현재 6 decimals 하드코딩). contract risks 섹션. +- [ ] recorder/player: **라이브 SUT 수동 smoke test** — 60 FPS / 10회 중 9회 reliability DoD는 샌드박스 unit test 불가, 실제 환경에서 검증 필요. +- [ ] player: `wait_for` UIA 이벤트 매핑 강화 (현재 host passthrough). +- [ ] player: `UiaPlayerHost` uia_path resolver가 마지막 `@AutomationId`만 사용 — 전체 ancestor chain 지원 필요. +- [ ] recorder: IME 조합 키 처리 (contract risks). ## Blocked diff --git a/docs/contracts/player.evaluation.md b/docs/contracts/player.evaluation.md new file mode 100644 index 0000000..69702a3 --- /dev/null +++ b/docs/contracts/player.evaluation.md @@ -0,0 +1,46 @@ +# Player — Evaluation + +**Evaluator:** independent +**Generator commit:** f17e764 +**Date:** 2026-04-07 + +## Verification + +- `dotnet build recordingtest.sln` -> green (0 warnings, 0 errors) +- `dotnet test tests/Recordingtest.Player.Tests` -> 6/6 passed +- Grep `Thread.Sleep(` / `Task.Delay(TimeSpan.FromSeconds` in `PlayerEngine.cs` -> 0 hits +- `Player_NoFixedSleep` test verified to actually load `src/Recordingtest.Player/PlayerEngine.cs` via `[CallerFilePath]` and assert via regex (not a dummy) + +## DoD verdict table + +| # | DoD item | Status | Evidence | +|---|---|---|---| +| 1 | CLI `--scenario` `--output-dir` `--no-launch` | pass | `Program.cs` lines 8-22 | +| 2 | `wait_for` support | partial (PoC) | `PlayerEngine.cs` lines 50-57 passes hint to `IPlayerHost.WaitFor`; real impl is PoC, generator flagged | +| 3 | element resolve + offset calc | pass | `ComputeScreenPoint` covered by `Player_ClickStep_InvokesHostClickAtExpectedScreenPoint` (125,210 expected) | +| 4 | failure artifacts on resolve fail | pass | `Player_ResolveFailure_CapturesArtifacts` asserts `host.Failures` populated with step index + reason | +| 5 | checkpoint save | pass | `Player_CheckpointStep_InvokesCapture` asserts AfterStep + SaveAs forwarded | +| 6 | exit codes (0/non-zero + artifact path) | pass | `Program.cs` returns 0/1/2/3/4/5; failure path prints `artifact_dir=` | +| 7 | 10/10 reliability (>=9 pass) | untestable / deferred | requires real SUT GUI; sandbox cannot launch; generator honestly flagged | +| 8 | no fixed sleep | pass | grep + `Player_NoFixedSleep` test | + +## Schema mirror check + +- `Model/Scenario.cs` covers name, description, sut(exe, startup_timeout_ms), steps, checkpoints, baselines +- `Model/Step.cs` covers kind enum (click/type/drag/hotkey/wait/checkpoint/save), target(uia_path, offset[]), value, wait_for, after_step, save_as +- `ScenarioLoader.cs` uses YamlDotNet `UnderscoredNamingConvention` -> matches recorder yaml schema +- `Player_ScenarioLoader_ParsesSampleYaml` exercises a realistic yaml end-to-end + +## IPlayerHost interface coverage + +`IPlayerHost.cs` exposes: `ResolveElement`, `WaitFor`, `Click`/`Type`/`Drag`/`Hotkey`, `CaptureCheckpoint`, `CaptureFailureArtifacts`. All four required surfaces (resolve, input, checkpoint, failure artifacts) present. + +## UiaPlayerHost note + +Real `UiaPlayerHost.cs` is compile-only PoC (per generator self-flag); not graded heavily. It builds clean and `Program.cs` only enters via `--no-launch` attach path. + +## Verdict + +**pass with caveats** + +All code-checkable DoD items pass. The 10/10 reliability item is deferred as `untestable` — explicitly blocked by sandbox constraints (cannot launch real GUI SUT), not by missing code. `wait_for` and `UiaPlayerHost` element resolution remain PoC-level and must be hardened before the reliability gate can actually be measured. diff --git a/docs/contracts/recorder.evaluation.md b/docs/contracts/recorder.evaluation.md new file mode 100644 index 0000000..6420652 --- /dev/null +++ b/docs/contracts/recorder.evaluation.md @@ -0,0 +1,47 @@ +# Recorder — Evaluation (v2) + +- Generator commit: `56b7233` +- Build: `dotnet build recordingtest.sln` → green (0 warnings, 0 errors) +- Tests: `dotnet test tests/Recordingtest.Recorder.Tests` → 9 passed / 0 failed / 0 skipped +- Evaluator: independent re-read of source + tests after Generator iteration 2 +- Previous evaluation archived at `docs/contracts/recorder.evaluation.v1.md` + +## Verdict table + +| # | DoD item | Verdict | Evidence | +|---|---|---|---| +| 1 | Console attach to SUT + 입력 캡처 시작 | pass (source) / untestable (live) | `Program.TryAttach` attaches by pid or by window-title scan via `Application.Attach`; never `Launch()`. `LowLevelHook` installs WH_KEYBOARD_LL + WH_MOUSE_LL on a dedicated STA thread. Cannot exercise against EG-BIM Modeler in this sandbox. | +| 2 | 캡처 이벤트: 키 down/up, 클릭/드래그/휠, 포커스 변경 | pass | `LowLevelHook` emits `key_down/up`, `mouse_down_l/r/m`, `mouse_up_l`, `wheel`, `move`. `DragCollapser` is a real state machine: on `mouse_down_l` it stores the down event and tracks max distance through `move`s; on `mouse_up_l` it picks `drag` if `max(maxDistSq, finalDistSq) >= threshold²` else `click`. Right-click and key/wheel paths emit their own steps. `Program.cs` calls `automation.RegisterFocusChangedEvent(...)`, builds an UIA path inside the callback (try/catch-guarded) and pushes a synthetic `focus_change` RawEvent into the same channel; `DragCollapser` translates it to a `focus` ScenarioStep. | +| 3 | Event shape `{ts, kind, uia_path, offset_norm, raw_coord, value}` | pass | `RawEvent` carries `TimestampMs, Kind, X, Y, Code, WheelDelta, FocusedElementPath`. `ScenarioStep` now exposes `Ts`, `RawCoord`, `EndOffset`, `EndRawCoord` plus existing `Kind/Target{UiaPath,Offset}/Value/WaitFor`. `DragCollapser` populates `Ts` and `RawCoord` (and end variants for drags) on every emitted step. | +| 4 | 3D viewport `offset_norm ∈ [0..1]` | pass | `OffsetNormalizer.Normalize` clamps each axis to `[0,1]`; covered by `OffsetNormalizer_ClicksInsideElement_ReturnsZeroToOne`. | +| 5 | Yaml schema 준수 | pass | `ScenarioWriter` uses `UnderscoredNamingConvention`; `ts` and `raw_coord` therefore serialize as snake_case. `ScenarioStep_YamlRoundtrip_PreservesTsAndRawCoord` asserts both `ts:` and `raw_coord` appear in the yaml and round-trip back to identical values. `YamlSerializer_RoundtripsScenario` covers click + masked-type. | +| 6 | 비밀번호/토큰 마스킹 | pass | `MaskPolicy.Apply` returns `` for `IsPassword` or `ClassName == "PasswordBox"`. `DragCollapser` calls `MaskPolicy.IsMasked` on the resolved snapshot for both click and key paths and overrides `step.Value = MaskPolicy.MaskedValue`. Unit covered by `FocusedElementIsPassword_ReturnsMasked`. | +| 7 | 60 FPS 영향 없음 | untestable | Requires running SUT + perf measurement; not possible in sandbox. Architecture (separate STA hook thread + unbounded `Channel`, UIA resolution moved out of the hook callback) is consistent with the requirement. Explicitly deferred. | +| 8 | 종료 시 요약(이벤트 수, 소요 시간, 미결 건수) | pass | `Program.Run` writes `[recorder] done. events={count} elapsed={sw.Elapsed} unresolved_paths={unresolved}` on Ctrl+C exit. | + +## Tests (9) + +1. `ElementPathBuilder_WithNestedElements_ReturnsFullPath` +2. `OffsetNormalizer_ClicksInsideElement_ReturnsZeroToOne` +3. `FocusedElementIsPassword_ReturnsMasked` +4. `YamlSerializer_RoundtripsScenario` +5. `Cli_MissingAttach_ExitTwo` +6. `DragCollapser_DownMoveUp_BeyondThreshold_EmitsDrag` *(new — drag emit beyond threshold)* +7. `DragCollapser_DownUp_BelowThreshold_EmitsClick` *(new — click emit below threshold)* +8. `DragCollapser_FocusChangeEvent_EmitsFocusStep` *(new — focus_change → focus step)* +9. `ScenarioStep_YamlRoundtrip_PreservesTsAndRawCoord` *(new — yaml ts + raw_coord)* + +All four iteration-2 tests are present, meaningful, and assert the previously-missing behavior (state machine threshold, focus translation, snake_case persistence). + +## Configurable threshold + +`DragCollapser` constructor: `public DragCollapser(int dragThresholdPx = 4)` and stored on `DragThresholdPx`. Default 4 px as required. + +## Remaining items + +- DoD #1 live attach + DoD #7 perf: structurally untestable in this sandbox; deferred to manual smoke on a workstation with EG-BIM Modeler. Source-side wiring is correct. These are no longer "missing code" — they are environment-bound. +- IME (한글 조합) handling: still not implemented; this is a contract Risk, not a DoD item. + +## Overall verdict + +**pass** — all DoD items with code obligations are satisfied; the only non-`pass` cells (1 live, 7) are explicitly deferred as untestable in the sandbox, not missing code. v1 release gates (drag collapse, focus capture, ts+raw_coord persistence, drag-state-machine tests) are all closed. diff --git a/docs/contracts/recorder.evaluation.v1.md b/docs/contracts/recorder.evaluation.v1.md new file mode 100644 index 0000000..aabae94 --- /dev/null +++ b/docs/contracts/recorder.evaluation.v1.md @@ -0,0 +1,42 @@ +# Recorder — Evaluation + +- Generator commit: `d486cbb` +- Build: `dotnet build recordingtest.sln` → green (0 warnings, 0 errors) +- Tests: `dotnet test tests/Recordingtest.Recorder.Tests` → 5 passed / 0 failed / 0 skipped +- Evaluator: independent reading of source + test artifacts + +## Verdict table + +| # | DoD item | Verdict | Evidence | +|---|---|---|---| +| 1 | Console attach to SUT + 입력 캡처 시작 | partial | `Program.TryAttach` uses `Application.Attach(pid)` or window-title scan; never `Launch()`. `LowLevelHook` installs WH_KEYBOARD_LL + WH_MOUSE_LL on dedicated STA thread. Wired but cannot be exercised in this sandbox (no SUT). | +| 2 | 캡처 이벤트: 키 down/up, 클릭/드래그/휠, 포커스 변경 | partial | `LowLevelHook.KeyboardProc` emits `key_down`/`key_up`; `MouseProc` emits L/R/M down+up, `wheel`, `move`. Drag is NOT collapsed into a single drag step (only down/up are recorded; `Program.IsInterestingForStep` only keeps `mouse_down_l/r` and `key_down`). Focus-change events are NOT captured (no UIA focus listener). | +| 3 | Event shape `{ts, kind, uia_path, offset_norm, raw_coord, value}` | partial | `RawEvent` carries `TimestampMs, Kind, X, Y, Code, WheelDelta`; `ScenarioStep`/`ScenarioTarget` carry `kind, uia_path, offset, value`. There is no persistent per-event log with all six fields — `raw_coord` is consumed for resolution but not stored on the emitted step. | +| 4 | 3D viewport `offset_norm ∈ [0..1]` | pass | `OffsetNormalizer.Normalize` divides by width/height, clamps each axis to `[0,1]`, returns `(0,0)` for zero-sized rects. Unit test `OffsetNormalizer_ClicksInsideElement_ReturnsZeroToOne` covers center, top-left, and out-of-bounds clamp. | +| 5 | Yaml schema 준수 (`name, description, sut{exe, startup_timeout_ms}, steps[{kind, target{uia_path, offset}, value, wait_for}]`) | pass | `Scenario.cs` matches the schema; `ScenarioWriter` uses `UnderscoredNamingConvention` so casing matches contract (`startup_timeout_ms`, `uia_path`, `wait_for`). Test `YamlSerializer_RoundtripsScenario` round-trips both a click and a masked-type step. | +| 6 | 비밀번호/토큰 마스킹 (PasswordBox → ``) | pass | `MaskPolicy.Apply` returns `` when `IsPassword` or `ClassName == "PasswordBox"`. `Program.ConsumeAsync` sets `step.Value = MaskPolicy.MaskedValue` on masked targets. Test `FocusedElementIsPassword_ReturnsMasked` covers masked + plain paths. | +| 7 | 60 FPS 영향 없음 | untestable | Requires running SUT + perf measurement; not possible in sandbox. Architecture (separate STA hook thread + Channel) is consistent with the requirement. | +| 8 | 종료 시 요약(이벤트 수, 소요 시간, 미결 건수) | pass (source-only) | `Program.Run` writes `[recorder] done. events={count} elapsed={sw.Elapsed} unresolved_paths={unresolved}` on Ctrl+C exit. | + +Additional checks: +- `Program.ParseArgs` returns null when `--attach` is missing → `Main` prints usage to stderr and returns `2`. Verified by `Cli_MissingAttach_ExitTwo`. +- `ElementPathBuilder.Build` produces `ClassName[@AutomationId='...']/...` walking from topmost ancestor down, falling back to `@Name` and then bare `ClassName`. Verified by `ElementPathBuilder_WithNestedElements_ReturnsFullPath`. +- IME (한글 조합) handling: not implemented (acknowledged in generator notes; listed as a Risk in the contract, not a DoD item). + +## Gaps / required follow-ups + +1. **Drag collapse** — `mouse_down_l` + movement + `mouse_up_l` should produce a single `kind: drag` step with start/end offsets. Today the recorder records only the down event as `click`. Blocks the contract evaluation step "Box 생성 드래그". +2. **Focus-change events** — No UIA `FocusChangedEventHandler` registration. Required by DoD #2. +3. **Per-event log shape** — Steps drop `ts` and `raw_coord`; the contract requires every event to be recorded in the `{ts, kind, uia_path, offset_norm, raw_coord, value}` shape. Either keep a sidecar event log or extend `ScenarioStep` with these fields. +4. **Manual SUT verification** — DoD #1 and the perf check (#7) require attaching to EG-BIM Modeler on a real workstation. This evaluator cannot perform that step. + +## Overall verdict + +**fail — blocks release until manual SUT run + drag/focus implementation.** + +Rationale (per CLAUDE.md): overall `pass` requires every DoD item `pass`. Items 1 and 2 are concretely incomplete (drag collapse + focus events missing; not merely untestable). Item 7 is structurally untestable in the sandbox and is treated as partial. Items 3 is partial because `ts`/`raw_coord` are not persisted in the output. The honest call is `fail` with the following release gates: + +- Implement drag collapse and focus-change capture, add unit tests for the drag state machine. +- Persist `ts` and `raw_coord` on each emitted step (or sidecar log). +- Manual smoke on EG-BIM Modeler: attach by pid, click Box command, drag a box, type into a PasswordBox, Ctrl+C, verify yaml + summary. +- Re-run evaluator after the above. diff --git a/docs/history/2026-04-07_이슈6-7-P1-UI자동화-orchestration.md b/docs/history/2026-04-07_이슈6-7-P1-UI자동화-orchestration.md new file mode 100644 index 0000000..d00f99a --- /dev/null +++ b/docs/history/2026-04-07_이슈6-7-P1-UI자동화-orchestration.md @@ -0,0 +1,47 @@ +# 2026-04-07 이슈 #6·#7 — P1 UI 자동화 (recorder/player) 오케스트레이션 + +- **이슈**: #6 (recorder), #7 (player) +- **소요 시간**: ~40분 (서브에이전트 병렬 + recorder 1회 재작업) +- **Context 사용량**: ~210k tokens (orchestrator 세션) + +## 사이클 + +1. 이슈 #6, #7 생성 → Generator × 2 **병렬 백그라운드** (FlaUI 4.0.0, YamlDotNet 16.1.3, TFM net8.0-windows) +2. 두 Generator 완료 +3. Evaluator × 2 **병렬 백그라운드** +4. **recorder fail** (drag 미집성 / focus 미캡처 / ts·raw_coord 미직렬화) → Re-Generator → Re-Evaluator **pass** +5. **player pass with caveats** (reliability untestable) +6. PROGRESS/PLAN 갱신, 이슈 close, push + +## 커밋 + +- `d486cbb` recorder v1 +- `f17e764` player v1 +- `56b7233` recorder v2 (drag state machine + focus events + ts/raw_coord) + +## 결과 + +| 모듈 | 테스트 | 결과 | +|------|--------|------| +| recorder | 9/9 (5→9) | pass v2 (2 untestable) | +| player | 6/6 | pass with caveats (1 untestable) | + +## Harness design 재검증 + +- Recorder v1 Generator가 자진 flag한 "drag 미집성, IME 미구현" 중 **drag 문제를 Evaluator가 추가 2건(focus, ts/raw_coord)과 함께 fail 판정** — Generator 자기 flag 외에 놓친 것이 있었음을 실증 +- Re-iteration 1회로 수렴 (역대 2회 연속 성공 패턴) +- 병렬 서브에이전트로 orchestrator 세션 컨텍스트를 ~210k로 유지 + +## Follow-ups (non-blocking) + +전부 "라이브 SUT 실제 실행" 종류: +- recorder DoD #1 (attach) / #7 (60 FPS) +- player DoD #2 (wait_for 강화) / #7 (10/10 reliability) +- player `UiaPlayerHost` full path resolver +- recorder IME 지원 + +## 다음 단계 + +**test-runner** — 시나리오 일괄 실행 + normalizer + diff-reporter 파이프라인. 이로써 PoC 4개가 통합 E2E 경로를 형성. + +이후 사용자 환경에서 **라이브 smoke test**를 수행해 reliability DoD 검증. diff --git a/docs/history/2026-04-07_이슈6-recorder-evaluator-v2.md b/docs/history/2026-04-07_이슈6-recorder-evaluator-v2.md new file mode 100644 index 0000000..95859ee --- /dev/null +++ b/docs/history/2026-04-07_이슈6-recorder-evaluator-v2.md @@ -0,0 +1,46 @@ +# 2026-04-07 이슈6 — Recorder Evaluator v2 + +- **Date:** 2026-04-07 +- **Issue:** #6 (recorder evaluation, iteration 2) +- **Role:** Evaluator (independent) +- **Generator commit under review:** `56b7233` +- **Contract:** `docs/contracts/recorder.md` +- **Previous evaluation:** `docs/contracts/recorder.evaluation.v1.md` (verdict: fail) + +## What I did + +1. Re-built the solution: `dotnet build recordingtest.sln` → 0 warnings, 0 errors. +2. Re-ran the recorder test suite: `dotnet test tests/Recordingtest.Recorder.Tests` → **9 passed / 0 failed / 0 skipped**. +3. Read the new/changed sources independently: + - `src/Recordingtest.Recorder/DragCollapser.cs` + - `src/Recordingtest.Recorder/Scenario.cs` + - `src/Recordingtest.Recorder/ScenarioWriter.cs` + - `src/Recordingtest.Recorder/Program.cs` + - `tests/Recordingtest.Recorder.Tests/RecorderTests.cs` +4. Cross-checked each v1 gap against the new code. +5. Archived v1 evaluation as `docs/contracts/recorder.evaluation.v1.md` and wrote a fresh v2 evaluation at `docs/contracts/recorder.evaluation.md`. + +## Findings + +- **Drag collapse:** real state machine. Tracks `down`, accumulates `maxDistSq` over `move`s, then on `mouse_up_l` compares `max(maxDistSq, finalDistSq)` against `DragThresholdPx²` to choose `drag` or `click`. Threshold is constructor-configurable (`DragCollapser(int dragThresholdPx = 4)`), default 4 px. +- **Focus capture:** `Program.cs` calls `automation.RegisterFocusChangedEvent(...)` (try/catch-guarded), builds an UIA path inside the callback via `ElementPathBuilder.Build`, and pushes a synthetic `focus_change` `RawEvent` carrying `FocusedElementPath` into the same `Channel`. `DragCollapser` translates `focus_change` into a `focus` `ScenarioStep`. +- **`ts` / `raw_coord` persistence:** `ScenarioStep` gained `Ts`, `RawCoord`, `EndOffset`, `EndRawCoord`. `ScenarioWriter` uses `UnderscoredNamingConvention`, so they serialize as `ts:` / `raw_coord:` / `end_offset:` / `end_raw_coord:`. The roundtrip test asserts both substrings appear in the yaml and round-trip back to identical values. +- **Tests:** 4 new tests verify drag-beyond-threshold, click-below-threshold, focus-change → focus step, and yaml roundtrip of `ts` + `raw_coord`. All meaningful (assertions match the contract event shape). +- **Right-click, wheel, key paths** also populate `Ts` and `RawCoord` consistently in `DragCollapser`. + +## Verdict + +**pass.** + +The three v1 code gaps (drag collapse, focus events, `ts`+`raw_coord` persistence) are all closed and covered by tests. The only non-`pass` cells in the v2 verdict table are DoD #1 live-attach and DoD #7 perf, which are structurally untestable inside this sandbox and explicitly deferred to a manual workstation smoke against EG-BIM Modeler. + +## Artifacts + +- `docs/contracts/recorder.evaluation.md` (v2, new) +- `docs/contracts/recorder.evaluation.v1.md` (v1, archived) +- `docs/history/2026-04-07_이슈6-recorder-evaluator-v2.md` (this file) + +## Notes + +- Did not modify any generator code. +- Did not update `PROGRESS.md`. diff --git a/docs/history/2026-04-07_이슈6-recorder-evaluator.md b/docs/history/2026-04-07_이슈6-recorder-evaluator.md new file mode 100644 index 0000000..7820783 --- /dev/null +++ b/docs/history/2026-04-07_이슈6-recorder-evaluator.md @@ -0,0 +1,34 @@ +# 2026-04-07 이슈 #6 — recorder Evaluator + +- 이슈: #6 (recorder evaluation) +- 역할: Evaluator (Generator와 독립) +- Generator commit: `d486cbb` +- 소요 시간: 약 8분 +- Context 사용량: 약 35k / 1M tokens +- 모델: claude-opus-4-6 (1M) + +## 수행 + +1. `dotnet build recordingtest.sln` 실행 → green (warning 0 / error 0). +2. `dotnet test tests/Recordingtest.Recorder.Tests` 실행 → 5 passed / 0 failed. +3. 소스 정독: `ElementPathBuilder.cs`, `OffsetNormalizer.cs`, `MaskPolicy.cs`, `Scenario.cs`, `ScenarioWriter.cs`, `LowLevelHook.cs`, `NativeMethods.cs`, `Program.cs`, `RecorderTests.cs`. +4. DoD 항목별 verdict 표 작성, 릴리즈 게이트 명시. +5. `docs/contracts/recorder.evaluation.md` 작성. + +## 결과 + +- 종합 verdict: **fail (blocks release until manual SUT run + drag/focus 구현)** +- pass: OffsetNormalizer 정규화·클램프, yaml 스키마 round-trip, PasswordBox 마스킹, 종료 요약 출력, CLI `--attach` 누락 처리. +- partial: attach + hook (sandbox에서 SUT 미존재로 untestable), 이벤트 종류 (drag collapse 부재 + focus event 부재), event shape (`ts`/`raw_coord`가 step에 미보존). +- untestable: 60 FPS 영향. +- 후속 권고: drag 상태머신 + UIA FocusChangedEventHandler 추가, step에 ts/raw_coord 보존, EG-BIM Modeler에 수동 attach 스모크 실행. + +## 산출물 + +- `docs/contracts/recorder.evaluation.md` +- `docs/history/2026-04-07_이슈6-recorder-evaluator.md` + +## 주의 + +- Generator 코드 미수정. +- `PROGRESS.md` 미수정. diff --git a/docs/history/2026-04-07_이슈7-player-evaluator.md b/docs/history/2026-04-07_이슈7-player-evaluator.md new file mode 100644 index 0000000..f28675f --- /dev/null +++ b/docs/history/2026-04-07_이슈7-player-evaluator.md @@ -0,0 +1,47 @@ +# 2026-04-07 — Issue #7 player evaluator + +**Role:** Evaluator (independent) +**Target:** `player` +**Generator commit:** f17e764 +**Verdict:** pass with caveats + +## What I did + +1. Built `recordingtest.sln` -> 0 warn / 0 err. +2. Ran `dotnet test tests/Recordingtest.Player.Tests` -> 6/6 pass. +3. Read all player sources: `PlayerEngine.cs`, `IPlayerHost.cs`, `Program.cs`, `ScenarioLoader.cs`, `Model/Scenario.cs`, `Model/Step.cs`, `UiaPlayerHost.cs` (skim). +4. Grep `Thread.Sleep(` and `Task.Delay(TimeSpan.FromSeconds` in `PlayerEngine.cs` -> 0 matches. +5. Verified `Player_NoFixedSleep` is real: uses `[CallerFilePath]` to locate `src/Recordingtest.Player/PlayerEngine.cs` and regex-asserts absence of fixed sleeps. +6. Mapped each DoD bullet to evidence and produced verdict table in `docs/contracts/player.evaluation.md`. + +## DoD scores + +| Item | Score | +|---|---| +| CLI args | pass | +| wait_for | partial (PoC passthrough) | +| resolve + offset | pass | +| failure artifacts | pass | +| checkpoint save | pass | +| exit codes | pass | +| 10/10 reliability | untestable (sandbox) | +| no fixed sleep | pass | + +## Key findings + +- `PlayerEngine.ComputeScreenPoint` formula matches expected `bounds.X + W*ox`, verified by test (125,210 from 100/200 + 50*0.5 / 40*0.25). +- `Program.cs` only supports `--no-launch` attach mode; launch path returns exit 5 with explicit message — generator was honest. +- `wait_for` hint is forwarded to `IPlayerHost.WaitFor` with timeout; engine throws on timeout. Real waiting strategy lives in `UiaPlayerHost` (PoC). +- `Model` classes mirror recorder yaml schema; `UnderscoredNamingConvention` handles `uia_path`, `after_step`, `save_as`, `startup_timeout_ms`. +- Reliability (10x replay) cannot be measured here — no real SUT GUI in sandbox. Deferred, not failed. + +## Constraints respected + +- Did NOT modify generator code. +- Did NOT update PROGRESS.md. +- Only wrote `docs/contracts/player.evaluation.md` and this history file. + +## Artifacts + +- `d:/MYCLAUDE_PROJECT/recordingtest/docs/contracts/player.evaluation.md` +- `d:/MYCLAUDE_PROJECT/recordingtest/docs/history/2026-04-07_이슈7-player-evaluator.md`