C.E.L_Slide_test2/orchestrator.py at f3ef4d917c775d497fbed8109042f46635e66f1a

Kyeongmin/C.E.L_Slide_test2

Files

kyeongmin 5d23b747ff fix(orchestrator): P5b first-line agent header strict + supplement throttle

Bug discovered during #24 IMP-24 K6 Stage 2 (2026-05-20):
- Codex r1, r2, r3 started with '=== IMPLEMENTATION_UNITS ===' on first line
  (not '[Codex #N] ...'), so detect_agent (P0-1 strict, first-line only)
  returned None.
- For non-audit issues, the P5 supplement guard was audit-only gated → silent
  loop until Codex r4 happened to use correct format. 4 rounds wasted.

Verified that #21 Stage 4 had the same latent silent loop pattern
('## [Codex #1]' first line) — orchestrator looped through ~10 Claude rounds
before random recovery. P5b fix addresses this long-standing bug.

Patch (defensive parser-contract hardening; does not assume single root cause):

1. RULES global gets explicit "FIRST non-empty line MUST be [Claude #N] /
   [Codex #N]" rule that OVERRIDES any stage-specific "body MUST contain"
   constraint.

2. COMPACT_PLAN_RULE wording clarified: "body" begins AFTER the first-line
   agent header. The 'body MUST contain ONLY' set no longer accidentally
   permits '=== IMPLEMENTATION_UNITS ===' on line 1.

3. is_codex None supplement guard:
   - audit-only gate REMOVED → fires for all issues (#24 latent loop fixed)
   - Throttle: max 2 supplements per stage; on 3rd violation, orchestrator
     hard-stops the issue with explicit "user action required" message
     and exits run_stage cleanly
   - Supplement message names both Claude AND Codex (Claude's first-line
     violation also breaks downstream via Codex mimicry)
   - Body-head 80 chars logged on detection failure (debugging aid)

4. Regression tests (+5 cases in test_orchestrator_core.py):
   - TestDetectAgent: '=== IMPLEMENTATION_UNITS ===' first line → None
   - TestDetectAgent: [Codex #N] first line + units after → 'codex' OK
   - TestDetectAgent: '## ', '📌 **', '**' prefix all → None
   - TestRulesAndCompactPlanFirstLineContract: RULES wording has FIRST/OVERRIDES
   - TestRulesAndCompactPlanFirstLineContract: COMPACT_PLAN_RULE has carve-out

Cosmetic side effect (accepted): Claude's '📌 **[Claude #N] ...**' or
'## [Codex #N] ...' decoration prefixes will fail detect_agent. Agents
will drop decorations from line 1; line 2+ can still use them.

Out of scope (NOT included to keep regression risk low):
- detect_agent function logic UNCHANGED (P0-1 strict preserved)
- consensus parser UNCHANGED
- stage loop structure UNCHANGED
- git/Gitea retrieval logic UNCHANGED
- audit-only mode P4/P4a guards UNCHANGED
- pre-post comment validation (future axis, larger refactor)

Total: 131/131 pytest pass (126 prior + 5 new).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-05-20 17:01:24 +09:00

96 KiB

Raw Blame History

View Raw

96 KiB Raw Blame History

96 KiB

Raw Blame History