Files
recordingtest/docs/contracts/player.evaluation.md
minsung 836afea5ee Orchestrate P1 UI automation evaluations (#6, #7)
- recorder v1 (fail) → v2 (pass): drag state machine, focus events, ts/raw_coord
- player pass with caveats: reliability untestable in sandbox
- PROGRESS.md Done rows + follow-ups for live SUT smoke test
- PLAN.md P1 pivoted to test-runner + live smoke test

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-07 14:37:14 +09:00

47 lines
2.8 KiB
Markdown

# Player — Evaluation
**Evaluator:** independent
**Generator commit:** f17e764
**Date:** 2026-04-07
## Verification
- `dotnet build recordingtest.sln` -> green (0 warnings, 0 errors)
- `dotnet test tests/Recordingtest.Player.Tests` -> 6/6 passed
- Grep `Thread.Sleep(` / `Task.Delay(TimeSpan.FromSeconds` in `PlayerEngine.cs` -> 0 hits
- `Player_NoFixedSleep` test verified to actually load `src/Recordingtest.Player/PlayerEngine.cs` via `[CallerFilePath]` and assert via regex (not a dummy)
## DoD verdict table
| # | DoD item | Status | Evidence |
|---|---|---|---|
| 1 | CLI `--scenario` `--output-dir` `--no-launch` | pass | `Program.cs` lines 8-22 |
| 2 | `wait_for` support | partial (PoC) | `PlayerEngine.cs` lines 50-57 passes hint to `IPlayerHost.WaitFor`; real impl is PoC, generator flagged |
| 3 | element resolve + offset calc | pass | `ComputeScreenPoint` covered by `Player_ClickStep_InvokesHostClickAtExpectedScreenPoint` (125,210 expected) |
| 4 | failure artifacts on resolve fail | pass | `Player_ResolveFailure_CapturesArtifacts` asserts `host.Failures` populated with step index + reason |
| 5 | checkpoint save | pass | `Player_CheckpointStep_InvokesCapture` asserts AfterStep + SaveAs forwarded |
| 6 | exit codes (0/non-zero + artifact path) | pass | `Program.cs` returns 0/1/2/3/4/5; failure path prints `artifact_dir=` |
| 7 | 10/10 reliability (>=9 pass) | untestable / deferred | requires real SUT GUI; sandbox cannot launch; generator honestly flagged |
| 8 | no fixed sleep | pass | grep + `Player_NoFixedSleep` test |
## Schema mirror check
- `Model/Scenario.cs` covers name, description, sut(exe, startup_timeout_ms), steps, checkpoints, baselines
- `Model/Step.cs` covers kind enum (click/type/drag/hotkey/wait/checkpoint/save), target(uia_path, offset[]), value, wait_for, after_step, save_as
- `ScenarioLoader.cs` uses YamlDotNet `UnderscoredNamingConvention` -> matches recorder yaml schema
- `Player_ScenarioLoader_ParsesSampleYaml` exercises a realistic yaml end-to-end
## IPlayerHost interface coverage
`IPlayerHost.cs` exposes: `ResolveElement`, `WaitFor`, `Click`/`Type`/`Drag`/`Hotkey`, `CaptureCheckpoint`, `CaptureFailureArtifacts`. All four required surfaces (resolve, input, checkpoint, failure artifacts) present.
## UiaPlayerHost note
Real `UiaPlayerHost.cs` is compile-only PoC (per generator self-flag); not graded heavily. It builds clean and `Program.cs` only enters via `--no-launch` attach path.
## Verdict
**pass with caveats**
All code-checkable DoD items pass. The 10/10 reliability item is deferred as `untestable` — explicitly blocked by sandbox constraints (cannot launch real GUI SUT), not by missing code. `wait_for` and `UiaPlayerHost` element resolution remain PoC-level and must be hardened before the reliability gate can actually be measured.