feat(#68): IMP-39 u1~u8 ranking_sort_policy single-source + backend↔frontend label-priority mirror
Some checks failed
Multi-MDX Regression (IMP-91) / multi-mdx-regression (push) Failing after 23s

u1: templates/phase_z2/catalog/ranking_sort_policy.yaml — single-source policy
    (label_priority asc {use_as_is:0, light_edit:1, restructure:2, reject:3}
    + confidence desc + v4_rank asc tie-break).
u2: src/phase_z2_pipeline.py — apply_ranking_sort helper + lookup_v4_match_with_fallback
    applies policy AFTER IMP-38 raw-window selection (raw default_window + usable_count
    preserved on RAW all_judgments).
u3: src/phase_z2_pipeline.py — _build_application_plan_unit forwards ranking_sort_policy
    + sorted_candidate_evidence into Step 9 payload.
u4: Front/client/src/services/designAgentApi.ts — frame_candidates builder reads
    unit.sorted_candidate_evidence + unit.ranking_sort_policy first; local LABEL_PRIORITY
    retained only on warn-fallback path.
u5: tests/test_ranking_sort_policy.py — pure permutation coverage (sample-agnostic).
u6: tests/phase_z2/test_label_priority_synthetic.py + fixtures/ranking_sort_policy/
    synthetic_divergence.yaml — low-conf use_as_is behind high-conf restructure.
u7: tests/phase_z2/test_imp39_mdx04_env_toggle_e2e.py — samples/mdx_batch/04.mdx with
    AI_FALLBACK_ENABLED=off; backend selected_v4_rank == frontend frame_candidates[0].
u8: tests/phase_z2/test_imp39_corpus_audit.py — real corpus sweep over
    tests/matching/v4_full32_result.yaml (10 MDX sections); section IDs loaded
    dynamically (RULE 0 / RULE 7 sample-agnostic).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-05-24 17:12:07 +09:00
parent 2e3747c5ab
commit 028042aaa9
8 changed files with 1536 additions and 12 deletions

View File

@@ -0,0 +1,56 @@
fixture_id: synthetic_divergence
purpose: |
Backend - frontend "rank 1" divergence regression - IMP-39 (#68).
Captures the Stage 1 root-cause scenario where the legacy backend
(raw V4 confidence-desc order) selects a high-confidence
lower-priority label, while the frontend (LABEL_PRIORITY asc +
confidence desc) selects the lower-confidence higher-priority
label. The single-source ranking policy
(templates/phase_z2/catalog/ranking_sort_policy.yaml, u1) resolves
the divergence so that both sides agree on "rank 1".
source: synthetic
sample_agnostic: true
notes:
- No real frame_id / template_id / MDX section is referenced.
- Only the four sort keys matter: label, confidence, v4_full_rank.
- The `tag` field is a fixture-local identifier for assertions.
- Field name `v4_full_rank` mirrors v4_full32_result.yaml shape so
fixture and corpus audit (u8) share the same key contract.
raw_judgments:
# confidence is strictly descending so v4_full_rank == raw V4
# confidence-desc rank (same axis as v4_full32_result.yaml).
- tag: synth_restructure_high
label: restructure
confidence: 0.92
v4_full_rank: 1
- tag: synth_light_edit_mid
label: light_edit
confidence: 0.70
v4_full_rank: 2
- tag: synth_use_as_is_low
label: use_as_is
confidence: 0.41
v4_full_rank: 3
- tag: synth_reject_low
label: reject
confidence: 0.30
v4_full_rank: 4
expected_legacy_raw_order:
- synth_restructure_high
- synth_light_edit_mid
- synth_use_as_is_low
- synth_reject_low
expected_policy_sorted_order:
- synth_use_as_is_low
- synth_light_edit_mid
- synth_restructure_high
- synth_reject_low
divergence_axis:
pre_policy_rank_1_tag: synth_restructure_high
post_policy_rank_1_tag: synth_use_as_is_low
frontend_candidate_0_tag: synth_use_as_is_low