Files
recordingtest/docs/contracts/player.evaluation.md
minsung 836afea5ee Orchestrate P1 UI automation evaluations (#6, #7)
- recorder v1 (fail) → v2 (pass): drag state machine, focus events, ts/raw_coord
- player pass with caveats: reliability untestable in sandbox
- PROGRESS.md Done rows + follow-ups for live SUT smoke test
- PLAN.md P1 pivoted to test-runner + live smoke test

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-07 14:37:14 +09:00

2.8 KiB

Player — Evaluation

Evaluator: independent Generator commit: f17e764 Date: 2026-04-07

Verification

  • dotnet build recordingtest.sln -> green (0 warnings, 0 errors)
  • dotnet test tests/Recordingtest.Player.Tests -> 6/6 passed
  • Grep Thread.Sleep( / Task.Delay(TimeSpan.FromSeconds in PlayerEngine.cs -> 0 hits
  • Player_NoFixedSleep test verified to actually load src/Recordingtest.Player/PlayerEngine.cs via [CallerFilePath] and assert via regex (not a dummy)

DoD verdict table

# DoD item Status Evidence
1 CLI --scenario --output-dir --no-launch pass Program.cs lines 8-22
2 wait_for support partial (PoC) PlayerEngine.cs lines 50-57 passes hint to IPlayerHost.WaitFor; real impl is PoC, generator flagged
3 element resolve + offset calc pass ComputeScreenPoint covered by Player_ClickStep_InvokesHostClickAtExpectedScreenPoint (125,210 expected)
4 failure artifacts on resolve fail pass Player_ResolveFailure_CapturesArtifacts asserts host.Failures populated with step index + reason
5 checkpoint save pass Player_CheckpointStep_InvokesCapture asserts AfterStep + SaveAs forwarded
6 exit codes (0/non-zero + artifact path) pass Program.cs returns 0/1/2/3/4/5; failure path prints artifact_dir=
7 10/10 reliability (>=9 pass) untestable / deferred requires real SUT GUI; sandbox cannot launch; generator honestly flagged
8 no fixed sleep pass grep + Player_NoFixedSleep test

Schema mirror check

  • Model/Scenario.cs covers name, description, sut(exe, startup_timeout_ms), steps, checkpoints, baselines
  • Model/Step.cs covers kind enum (click/type/drag/hotkey/wait/checkpoint/save), target(uia_path, offset[]), value, wait_for, after_step, save_as
  • ScenarioLoader.cs uses YamlDotNet UnderscoredNamingConvention -> matches recorder yaml schema
  • Player_ScenarioLoader_ParsesSampleYaml exercises a realistic yaml end-to-end

IPlayerHost interface coverage

IPlayerHost.cs exposes: ResolveElement, WaitFor, Click/Type/Drag/Hotkey, CaptureCheckpoint, CaptureFailureArtifacts. All four required surfaces (resolve, input, checkpoint, failure artifacts) present.

UiaPlayerHost note

Real UiaPlayerHost.cs is compile-only PoC (per generator self-flag); not graded heavily. It builds clean and Program.cs only enters via --no-launch attach path.

Verdict

pass with caveats

All code-checkable DoD items pass. The 10/10 reliability item is deferred as untestable — explicitly blocked by sandbox constraints (cannot launch real GUI SUT), not by missing code. wait_for and UiaPlayerHost element resolution remain PoC-level and must be hardened before the reliability gate can actually be measured.