feat(#68): IMP-39 u1~u8 ranking_sort_policy single-source + backend↔frontend label-priority mirror

u1: templates/phase_z2/catalog/ranking_sort_policy.yaml — single-source policy (label_priority asc {use_as_is:0, light_edit:1, restructure:2, reject:3} + confidence desc + v4_rank asc tie-break). u2: src/phase_z2_pipeline.py — apply_ranking_sort helper + lookup_v4_match_with_fallback applies policy AFTER IMP-38 raw-window selection (raw default_window + usable_count preserved on RAW all_judgments). u3: src/phase_z2_pipeline.py — _build_application_plan_unit forwards ranking_sort_policy + sorted_candidate_evidence into Step 9 payload. u4: Front/client/src/services/designAgentApi.ts — frame_candidates builder reads unit.sorted_candidate_evidence + unit.ranking_sort_policy first; local LABEL_PRIORITY retained only on warn-fallback path. u5: tests/test_ranking_sort_policy.py — pure permutation coverage (sample-agnostic). u6: tests/phase_z2/test_label_priority_synthetic.py + fixtures/ranking_sort_policy/ synthetic_divergence.yaml — low-conf use_as_is behind high-conf restructure. u7: tests/phase_z2/test_imp39_mdx04_env_toggle_e2e.py — samples/mdx_batch/04.mdx with AI_FALLBACK_ENABLED=off; backend selected_v4_rank == frontend frame_candidates[0]. u8: tests/phase_z2/test_imp39_corpus_audit.py — real corpus sweep over tests/matching/v4_full32_result.yaml (10 MDX sections); section IDs loaded dynamically (RULE 0 / RULE 7 sample-agnostic). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-24 17:12:07 +09:00
parent 2e3747c5ab
commit 028042aaa9
8 changed files with 1536 additions and 12 deletions
--- a/tests/phase_z2/fixtures/ranking_sort_policy/synthetic_divergence.yaml
+++ b/tests/phase_z2/fixtures/ranking_sort_policy/synthetic_divergence.yaml
@@ -0,0 +1,56 @@
+fixture_id: synthetic_divergence
+purpose: |
+  Backend - frontend "rank 1" divergence regression - IMP-39 (#68).
+  Captures the Stage 1 root-cause scenario where the legacy backend
+  (raw V4 confidence-desc order) selects a high-confidence
+  lower-priority label, while the frontend (LABEL_PRIORITY asc +
+  confidence desc) selects the lower-confidence higher-priority
+  label. The single-source ranking policy
+  (templates/phase_z2/catalog/ranking_sort_policy.yaml, u1) resolves
+  the divergence so that both sides agree on "rank 1".
+
+source: synthetic
+sample_agnostic: true
+notes:
+  - No real frame_id / template_id / MDX section is referenced.
+  - Only the four sort keys matter: label, confidence, v4_full_rank.
+  - The `tag` field is a fixture-local identifier for assertions.
+  - Field name `v4_full_rank` mirrors v4_full32_result.yaml shape so
+    fixture and corpus audit (u8) share the same key contract.
+
+raw_judgments:
+  # confidence is strictly descending so v4_full_rank == raw V4
+  # confidence-desc rank (same axis as v4_full32_result.yaml).
+  - tag: synth_restructure_high
+    label: restructure
+    confidence: 0.92
+    v4_full_rank: 1
+  - tag: synth_light_edit_mid
+    label: light_edit
+    confidence: 0.70
+    v4_full_rank: 2
+  - tag: synth_use_as_is_low
+    label: use_as_is
+    confidence: 0.41
+    v4_full_rank: 3
+  - tag: synth_reject_low
+    label: reject
+    confidence: 0.30
+    v4_full_rank: 4
+
+expected_legacy_raw_order:
+  - synth_restructure_high
+  - synth_light_edit_mid
+  - synth_use_as_is_low
+  - synth_reject_low
+
+expected_policy_sorted_order:
+  - synth_use_as_is_low
+  - synth_light_edit_mid
+  - synth_restructure_high
+  - synth_reject_low
+
+divergence_axis:
+  pre_policy_rank_1_tag: synth_restructure_high
+  post_policy_rank_1_tag: synth_use_as_is_low
+  frontend_candidate_0_tag: synth_use_as_is_low