recordingtest/.claude/commands/evaluate.md at 836afea5ee520bed9eea67122415a2fec05c48e5

Files

minsung 7ffbb1f757 Set up AI dev environment for recordingtest (#2 )

- CLAUDE.md with collaboration rules and Planner/Generator/Evaluator cycle
- .claude/ agents, commands, skills, hooks per Claude Code conventions
- Sprint Contracts for sut-prober, normalizer, recorder, player, diff-reporter
- SUT catalog (EG-BIM Modeler, 187 plugins) and .gitignore excluding SUT tree
- PROGRESS.md / PLAN.md as shared agent handoff state
- Solution scaffold targeting sut-prober PoC

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

2026-04-07 13:57:20 +09:00

860 B

Raw Blame History

name, description, allowed-tools

name	description	allowed-tools
evaluate	Grade a completed module against its Sprint Contract via the evaluator agent. Usage /evaluate <contract-slug>	Read, Glob, Grep, Bash, Agent

Evaluate module: $ARGUMENTS

Delegate to the evaluator subagent. It must:

Read docs/contracts/$ARGUMENTS.md. Refuse if missing.
For each Definition-of-Done item, run the verification named in the contract's Evaluation plan.
Collect evidence (command output, diffs, file paths).
Write docs/contracts/$ARGUMENTS.evaluation.md with the verdict table.
Return the verdict to the caller.

If verdict is fail, do NOT mark PROGRESS.md as done — report back so the generator can iterate. If verdict is pass, the caller (not the evaluator) may update PROGRESS.md.

Never let the generator and evaluator be the same agent in a single session.

860 B Raw Blame History

860 B

Raw Blame History