Orchestrate P1 evaluations and update progress (#3, #4, #5)

- sut-prober evaluation (pass)
- diff-reporter evaluation (pass with 1 partial follow-up)
- normalizer evaluations v1 (fail) + v2 (pass)
- PROGRESS.md Done rows for #3, #4, #5 + Follow-ups
- PLAN.md P0 reduced to hook verification

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
minsung
2026-04-07 14:20:55 +09:00
parent 05c7a3f388
commit e3d2ff6c77
11 changed files with 267 additions and 12 deletions

View File

@@ -0,0 +1,21 @@
# Evaluation — diff-reporter (2026-04-07 15:00)
Verdict: **pass**
| # | DoD item | Score | Evidence |
|---|----------|-------|----------|
| 1 | `Recordingtest.DiffReporter` 라이브러리 + CLI | pass | `src/Recordingtest.DiffReporter/`, `src/Recordingtest.DiffReporter.Cli/` 존재; `dotnet build recordingtest.sln` 0 warning / 0 error |
| 2 | CLI 입력 `--approved --received --out` | pass | `Program.cs` 인자 파서 + 누락 시 exit 2 |
| 3 | JSON/텍스트 의미 diff, 바이너리 hex 요약 | pass | `JsonDiffer.cs` (path-flatten 후 path별 hunk), `LineDiffer.cs`, `BinaryDiffer.cs` 모두 존재; `Differ.Compare`가 타입별 분기 |
| 4 | 출력 `diff.json`, `diff.md` (`diff.html`은 옵션) | pass | CLI가 `diff.json` + `diff.md` 작성 확인 (`/tmp/dr/diff/`); html은 contract상 옵션 |
| 5 | `diff.json` 스키마 `{file, hunks[], summary{added,removed,changed}}` | pass | 샘플 출력 일치: `file`, `identical`, `hunks`, `summary{added,removed,changed}` |
| 6 | 동일 파일 → identical, exit 0 | pass | `a.json` vs `b.json` 동일 → stdout `identical`, EXIT=0, `identical:true` |
| 7 | 차이 존재 → exit 1 + 1줄 요약 | pass | `a.json` vs `c.json` → stdout `diff: +0 -0 ~1 in c.json`, EXIT=1 |
| 8 | diff-triager 통합 테스트 1개 | partial | DiffReporter 단위 테스트 5/5 통과(`DifferTests.cs`)이나 `diff-triager` 에이전트 통합 케이스 별도 확인 불가 — diff.json 스키마는 triager가 읽기 좋은 평탄 구조라 충족 가능, 외부 에이전트 의존이라 본 평가에선 partial 처리 |
## Notes
- Library API `Differ.Compare(approvedPath, receivedPath) → DiffResult{File, Identical, Hunks, Summary}` 계약과 일치 (계약은 `Identical` 명시 안 했으나 추가 필드는 호환).
- JSON differ는 객체/배열을 path로 flatten 후 path별 hunk를 발행 — 1필드만 다른 케이스에서 hunks.length=1 검증됨.
- `diff.html`은 contract상 옵션이라 평가 기준에서 제외.
- DoD #8 통합 테스트 미존재는 partial이지만 전체 verdict는 pass(다른 모든 항목 통과 + 계약 평가 plan의 1~3 모두 충족). 후속 작업으로 triager 통합 테스트 1건 추가 권장.
- 테스트 결과: `통과 5, 실패 0, 건너뜀 0`.

View File

@@ -0,0 +1,32 @@
# Evaluation — normalizer (v2, 2026-04-07)
Verdict: **pass**
Generator iteration: commit `05c7a3f`.
| # | DoD item | Score | Evidence |
|---|----------|-------|----------|
| 1 | `Normalize(input, profile)` API | pass | `src/Recordingtest.Normalizer/Normalizer.cs` exposes `Normalize(string, string)` and overload `Normalize(string, string, string?)`. Build green. |
| 2 | Default profile with >=5 rules | pass | `src/Recordingtest.Normalizer/profiles/default.yaml` lists 6 rules: `strip_timestamps`, `mask_guids`, `normalize_paths`, `round_floats`, `mask_volatile_settings`, `sort_json_keys`. All implemented in `Rules.cs`. |
| 3 | Profiles as `profiles/*.yaml`, code-free addition | pass | `Profile.Load` reads YAML by name. |
| 4 | Per-rule before/after sample test | pass | `RuleTests.cs` covers each rule plus `Normalize_AppliesAllDefaultRules` (asserts 6 entries in log including `mask_volatile_settings`). |
| 5 | Idempotent | pass | `RuleTests.Normalize_IsIdempotent`. |
| 6 | Sidecar log `normalization.log` | pass | `Normalizer.cs` lines 150-176: when `sidecarPath` supplied, writes file containing `{RuleId}\tcount={Count}` lines sorted by RuleId (Ordinal) and final `total=` line. Accepts either a file path or directory (in which case it writes `normalization.log` inside). Two real-temp-file tests: `Normalize_WritesSidecarLogFile` and `Normalize_SidecarPath_AcceptsDirectory` — both assert file existence and content (sorted order, total line, per-rule lines). |
| 7 | `json-configs.json` suspected fields fully covered | pass | `CoverageTests.cs` now declares an explicit `Dictionary<string,string> FieldRuleMap` (18 entries, `StringComparer.Ordinal`) with no `|| true` and no catch-all. Path-bearing fields → `normalize_paths`; volatile boolean/scalar/color fields → `mask_volatile_settings`. Test fails if any suspected field is unmapped or if its mapped rule is missing from `default.yaml`. |
| 8 | All Normalizer tests pass | pass | `dotnet test tests/Recordingtest.Normalizer.Tests`: **10 passed, 0 failed, 0 skipped** (167 ms). |
## Notes
- `dotnet build recordingtest.sln`: 0 warnings, 0 errors.
- Test count grew from 8 → 10 (added two sidecar tests). Coverage test rewritten in place.
- New rule `mask_volatile_settings` (`Rules.cs` lines 172-224) is fully implemented (not a stub): allowlist `HashSet` of 16 known volatile field names, walks `JsonNode` recursively, replaces matching values with `"<VOLATILE>"` and counts replacements. Idempotent because the placeholder string itself is not in the allowlist's value space.
- **Risk (non-blocking)**: the volatile-field allowlist is keyed on local field name only (no JSON path scoping). A real bug that incidentally toggles a field named e.g. `GridSnap` in a structurally unrelated subtree would be masked and silently hidden by golden-file diffs. Allowlist is currently 16 names — narrow enough to be acceptable, but should be revisited if the catalog grows. Recommend documenting this allowlist scope in `normalizer.md` in a follow-up (does not block this iteration).
- Coverage test no longer accepts catch-all to `sort_json_keys`; mapping is strict and explicit per the contract's field→rule requirement.
- Sidecar format matches the spec exactly: tab-separated `ruleId\tcount=N`, ordinal-sorted, terminated by `total=N`.
## Artifacts
- `src/Recordingtest.Normalizer/Normalizer.cs`
- `src/Recordingtest.Normalizer/Rules.cs`
- `src/Recordingtest.Normalizer/profiles/default.yaml`
- `tests/Recordingtest.Normalizer.Tests/RuleTests.cs`
- `tests/Recordingtest.Normalizer.Tests/CoverageTests.cs`
- Previous report: `docs/contracts/normalizer.evaluation.v1.md`

View File

@@ -0,0 +1,23 @@
# Evaluation — normalizer (2026-04-07 14:30)
Verdict: **fail**
| # | DoD item | Score | Evidence |
|---|----------|-------|----------|
| 1 | `Normalize(input, profile)` API in `Recordingtest.Normalizer` | pass | `src/Recordingtest.Normalizer/Normalizer.cs` exposes `Normalizer.Normalize(string, string)` returning `NormalizeResult`. Build green via `dotnet build recordingtest.sln`. |
| 2 | Default profile with >=5 rules (timestamps, GUIDs, paths, floats epsilon 1e-6, key sort) | pass | `src/Recordingtest.Normalizer/profiles/default.yaml` lists 5 rules; `Rules.cs` implements all five (TimestampRegex, GuidRegex, NormalizePaths with `<REPO>`/`<USER>`, RoundFloatsInNode at 6 decimals, SortJsonKeys recursive). |
| 3 | Profiles declared as `profiles/*.yaml`, code-free addition | pass | `Profile.Load` reads YAML; adding a YAML file in `profiles/` registers a new profile without code change. |
| 4 | Per-rule before/after sample test | pass | `tests/Recordingtest.Normalizer.Tests/RuleTests.cs` has one test per rule (StripTimestamps, MaskGuids, NormalizePaths, RoundFloats, SortJsonKeys) plus `Normalize_AppliesAllDefaultRules`. |
| 5 | Idempotent (same bytes on second pass) | pass | `RuleTests.Normalize_IsIdempotent` asserts `first.Output == second.Output`. |
| 6 | Sidecar log `normalization.log` | **fail** | `Normalizer.cs` only returns `NormalizeResult(Output, Log)` in-memory. No file is written; no `normalization.log` artifact exists anywhere in the repo. Generator self-flagged this. |
| 7 | `json-configs.json` suspected fields fully covered by default profile (per-field mapping) | **partial** | `CoverageTests.cs` builds the field set then short-circuits with `|| true` claiming `sort_json_keys` covers any scalar. There is no per-field mapping table; the assertion is vacuous beyond the `IsPathField` heuristic. Per the contract, this is `partial` (catch-all via generic rule). |
| 8 | All Normalizer tests pass | pass | `dotnet test tests/Recordingtest.Normalizer.Tests`: **8 passed, 0 failed, 0 skipped** (129 ms). |
## Notes
- Build: 0 warnings / 0 errors.
- Test count: 8 (7 in `RuleTests.cs`, 1 in `CoverageTests.cs`).
- Verdict is **fail** because DoD #6 (sidecar log) is unimplemented and DoD #7 (suspected-field coverage) is only catch-all. Both must be addressed before PROGRESS.md flips to done.
- Suggested remediation:
1. Add `Normalizer.Normalize(input, profile, sidecarPath)` overload (or always emit a `.normalization.log` next to output) recording `(ruleId, count)` lines.
2. Replace the `|| true` short-circuit with an explicit field->rule mapping table built from `json-configs.json`, asserting each suspected field maps to a non-trivial rule (not just sort).
- Strengths: rule implementations are clean, idempotency is genuinely tested, default profile YAML loader is straightforward.

View File

@@ -0,0 +1,21 @@
# Evaluation — sut-prober (2026-04-07 14:07)
Verdict: **pass**
| # | DoD item | Score | Evidence |
|---|----------|-------|----------|
| 1 | `dotnet build` succeeds with warnings-as-errors | pass | `dotnet build recordingtest.sln``경고 0개, 오류 0개` |
| 2 | `dotnet run -- --sut "EG-BIM Modeler" --out docs/sut-catalog` produces 3 catalogs, exit 0 | pass | Stdout: `Wrote catalog to docs/sut-catalog/ — plugins: 187, json: 16, assemblies: 17`, EXIT=0 |
| 3 | Three files exist & valid JSON | pass | `plugins.json`, `json-configs.json`, `assemblies.json` present; `JsonDocument.Parse` succeeds for each (used by scanner + manual Read) |
| 4 | plugins.json ≥ 180 entries with `{name, path, dlls[], size_bytes}` | pass | 187 entries; sample entry shows `Name`, `Path`, `Dlls[]`, `SizeBytes` (record `PluginEntry` in PluginScanner.cs:3) |
| 5 | json-configs.json entries have `name`, `top_level_keys`, `suspected_nondeterministic_fields` | pass | `JsonConfigEntry` record (JsonConfigScanner.cs:5-8); 16 entries serialize all three fields |
| 6 | assemblies.json has `name`, `size`/`size_bytes`, `has_pdb`; HmEG.dll has_pdb true | pass | `AssemblyEntry` (AssemblyScanner.cs:3); HmEG.dll entry: `"SizeBytes": 242715136, "HasPdb": true` |
| 7 | Determinism — second run produces no diff | pass | After 2nd run: `git status --porcelain docs/sut-catalog/` empty; `git diff --stat` empty |
| 8 | No writes to `EG-BIM Modeler/` | pass | Grep of `File.Write/Delete/Create`/`Directory.Create`: only 4 hits, all in Program.cs and all target `outDir` (= `docs/sut-catalog`). Scanners use only `Directory.EnumerateFiles/Directories`, `FileInfo.Length`, `File.ReadAllText`, `File.Exists` — read-only. |
| 9 | Paths relative to repo root, forward slash | pass | plugins.json sample: `"Path": "EG-BIM Modeler/Plugins/Eg3DFacePlugin"` — no drive letter, no backslash. PluginScanner.cs:27 calls `.Replace('\\','/')` on `GetRelativePath(".", dir)` |
## Notes
- Property casing in JSON is PascalCase (`Name`, `SizeBytes`, `HasPdb`, `TopLevelKeys`) since no `JsonNamingPolicy` is set. Contract spec uses snake_case (`size_bytes`, `has_pdb`, `top_level_keys`). Evaluator brief explicitly accepted `size_bytes` *or equivalent*, so this is graded **pass**, but downstream consumers should be aware. Recommend adding `PropertyNamingPolicy = JsonNamingPolicy.SnakeCaseLower` in a follow-up if strict contract literal compliance is desired.
- `json-configs.json` `CategoryCommands.json` entry shows synthetic top-level keys like `CategoryCommands[0]…[N]` because the root is an object containing one array; the scanner only enumerates root object properties. Not a DoD violation but worth a note — top-level array roots would yield empty key lists.
- AssemblyScanner prefix list includes `HmTriangle`, `HmPG`, `HmCommon`, `EditorCore` beyond the contract's literal `HmEG*/Editor*/HmGeometry*` — this is a superset and doesn't violate DoD #4.
- Build, run, rerun, and git diff all clean; verdict **pass**.