fix(orchestrator): P5b first-line agent header strict + supplement throttle
Bug discovered during #24 IMP-24 K6 Stage 2 (2026-05-20): - Codex r1, r2, r3 started with '=== IMPLEMENTATION_UNITS ===' on first line (not '[Codex #N] ...'), so detect_agent (P0-1 strict, first-line only) returned None. - For non-audit issues, the P5 supplement guard was audit-only gated → silent loop until Codex r4 happened to use correct format. 4 rounds wasted. Verified that #21 Stage 4 had the same latent silent loop pattern ('## [Codex #1]' first line) — orchestrator looped through ~10 Claude rounds before random recovery. P5b fix addresses this long-standing bug. Patch (defensive parser-contract hardening; does not assume single root cause): 1. RULES global gets explicit "FIRST non-empty line MUST be [Claude #N] / [Codex #N]" rule that OVERRIDES any stage-specific "body MUST contain" constraint. 2. COMPACT_PLAN_RULE wording clarified: "body" begins AFTER the first-line agent header. The 'body MUST contain ONLY' set no longer accidentally permits '=== IMPLEMENTATION_UNITS ===' on line 1. 3. is_codex None supplement guard: - audit-only gate REMOVED → fires for all issues (#24 latent loop fixed) - Throttle: max 2 supplements per stage; on 3rd violation, orchestrator hard-stops the issue with explicit "user action required" message and exits run_stage cleanly - Supplement message names both Claude AND Codex (Claude's first-line violation also breaks downstream via Codex mimicry) - Body-head 80 chars logged on detection failure (debugging aid) 4. Regression tests (+5 cases in test_orchestrator_core.py): - TestDetectAgent: '=== IMPLEMENTATION_UNITS ===' first line → None - TestDetectAgent: [Codex #N] first line + units after → 'codex' OK - TestDetectAgent: '## ', '📌 **', '**' prefix all → None - TestRulesAndCompactPlanFirstLineContract: RULES wording has FIRST/OVERRIDES - TestRulesAndCompactPlanFirstLineContract: COMPACT_PLAN_RULE has carve-out Cosmetic side effect (accepted): Claude's '📌 **[Claude #N] ...**' or '## [Codex #N] ...' decoration prefixes will fail detect_agent. Agents will drop decorations from line 1; line 2+ can still use them. Out of scope (NOT included to keep regression risk low): - detect_agent function logic UNCHANGED (P0-1 strict preserved) - consensus parser UNCHANGED - stage loop structure UNCHANGED - git/Gitea retrieval logic UNCHANGED - audit-only mode P4/P4a guards UNCHANGED - pre-post comment validation (future axis, larger refactor) Total: 131/131 pytest pass (126 prior + 5 new). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -607,6 +607,32 @@ RULE 7: No hardcoding. RULE 8: AI finds 1px first. RULE 9: LLM classifies, code
|
||||
RULE 10: Don't uncritically accept. RULE 11: Checkpoint. RULE 12: Full paths. RULE 13: Anchor sync.
|
||||
PZ-1: AI=0 normal. PZ-2: 1turn=1step. PZ-3: No speculative. PZ-4: No silent shrink.
|
||||
|
||||
=== COMMENT FORMAT (P5b 2026-05-20 — STRICT, OVERRIDES ALL STAGE-SPECIFIC BODY RULES) ===
|
||||
The FIRST non-empty line of EVERY Gitea comment MUST start with one of:
|
||||
[Claude #N] <stage description>
|
||||
[Codex #N] <stage description>
|
||||
|
||||
This rule applies to ALL stages (Stage 1 ~ Stage 6) and ALL issue types
|
||||
(regular, execution-issue, audit-only). No prefix, no decoration, no banner,
|
||||
no audit anchor before the agent header. Examples:
|
||||
|
||||
CORRECT:
|
||||
[Codex #3] Stage 2 simulation-plan review — IMP-24
|
||||
|
||||
📌 Verification table
|
||||
...
|
||||
|
||||
WRONG (orchestrator detect_agent will fail; stage cannot advance):
|
||||
📌 **[Claude #3] Stage 2 ...**
|
||||
## [Codex #3] Stage 2 ...
|
||||
=== IMPLEMENTATION_UNITS === (header missing entirely)
|
||||
Audit anchor: ... (preface before header)
|
||||
|
||||
This first-line-strict rule OVERRIDES any stage-specific "body MUST contain
|
||||
ONLY" rule (e.g., COMPACT_PLAN_RULE). Those body rules apply AFTER the
|
||||
mandatory first-line agent header. Decorations / banners / anchors go on
|
||||
line 2 or later.
|
||||
|
||||
=== CONSENSUS + REWIND (2026-05-16 lock) ===
|
||||
Final line of every Codex review comment MUST be exactly one of:
|
||||
FINAL_CONSENSUS: YES
|
||||
@@ -901,10 +927,14 @@ def _check_dormant_triggers():
|
||||
COMPACT_PLAN_RULE = """
|
||||
|
||||
COMPACT PLAN REQUIREMENTS (strict):
|
||||
- The FIRST non-empty line of your comment MUST be the agent header
|
||||
([Claude #N] ... or [Codex #N] ...). This is enforced by RULES (P5b 2026-05-20)
|
||||
and OVERRIDES the "body" constraints below. The Stage 2 compact body begins
|
||||
AFTER the first-line agent header — NOT on line 1.
|
||||
- Total Stage 2 plan body MUST be ≤ 5,000 chars (4,000 chars target).
|
||||
- NO code snippets in this comment. Code goes in Stage 3 (code-edit), not Stage 2 plan.
|
||||
References to file:line locations are fine. Inline code blocks are forbidden.
|
||||
- The Stage 2 plan body MUST contain ONLY:
|
||||
- After the first-line agent header, the Stage 2 plan body MUST contain ONLY:
|
||||
a) === IMPLEMENTATION_UNITS === YAML block (units with id/summary/files/tests/estimate_lines)
|
||||
b) Brief per-unit rationale (≤ 3 lines per unit, no full code)
|
||||
c) Out-of-scope notes
|
||||
@@ -1328,20 +1358,40 @@ def run_stage(n, title, body, sid):
|
||||
last = comments[-1]["body"]
|
||||
is_codex = detect_agent(last) == "codex"
|
||||
if not is_codex:
|
||||
log(" Codex 응답 미감지 — continuing")
|
||||
# P5 (2026-05-20) — audit-mode 에서 detect_agent None 의 흔한 원인 =
|
||||
# agent 가 audit anchor / preface 를 첫 줄에 박아서 P0-1 strict 가 못 찾음.
|
||||
# 자동 supplement 로 format 교정 요청 → 무한 루프 자동 break.
|
||||
if _audit_mode(title):
|
||||
log(f" Codex 응답 미감지 — first line: {last.lstrip().splitlines()[0][:80]!r}" if last and last.strip() else " Codex 응답 미감지 — empty body")
|
||||
# P5b (2026-05-20) — detect_agent None 시 supplement 가드.
|
||||
# 범위 변경: audit-only 제한 해제 — 모든 issue 에서 작동 (#24 같은 일반 이슈 silent loop fix).
|
||||
# Throttle: 현재 stage 안에 이미 N (=2) 회 supplement 가 누적되면 stop + user-action-required.
|
||||
# 직전 N supplement 가 박혀도 LLM 이 또 위반하면 4 번째 round 부터는 hard stop.
|
||||
SUPP_MAX = 2
|
||||
SUPP_MARKER = "⚠️ **[Orchestrator]** Agent header missing"
|
||||
stage_cmts = comments[start_cnt:]
|
||||
supp_count = sum(1 for c in stage_cmts if (c.get("body") or "").lstrip().startswith(SUPP_MARKER))
|
||||
if supp_count >= SUPP_MAX:
|
||||
log(f"⛔ Agent header supplement {supp_count}/{SUPP_MAX} reached — STOP (user action required)")
|
||||
try: gitea(f"issues/{n}/comments", "POST", {"body":
|
||||
"⚠️ **[Orchestrator]** Codex 응답 미감지 — `detect_agent` 가 첫 줄에서 "
|
||||
"`[Codex #N]` 또는 `[Claude #N]` 패턴을 찾지 못함.\n\n"
|
||||
"AUDIT-ONLY mode 의 흔한 원인: `Audit anchor:` 같은 preface 가 첫 줄에 있음.\n\n"
|
||||
"다음 round 부터 모든 comment 의 **FIRST non-empty line 은 반드시**:\n"
|
||||
" `[Codex #N] <stage description>` 또는 `[Claude #N] <stage description>`\n"
|
||||
"Audit anchor / banner / preface 는 line 2 이후 에만. 안 그러면 orchestrator 가 "
|
||||
"stage 진행 못 함 (P0-1 first-line strict)."})
|
||||
f"⛔ **[Orchestrator]** STOP — Stage `{sid}` cannot advance.\n\n"
|
||||
f"`detect_agent` failed {supp_count}+ times in this stage. The LLM is not honoring "
|
||||
f"the first-line agent header contract despite supplements.\n\n"
|
||||
"**Action required (human)**: review last few comments, ensure FIRST non-empty line is "
|
||||
"`[Claude #N]` or `[Codex #N]`, then restart `python -u .\\orchestrator.py --issue {n}`.\n\n"
|
||||
"Orchestrator run is exiting this issue to prevent further token waste."})
|
||||
except: pass
|
||||
return False # exit run_stage → run_issue treats as external close → moves on
|
||||
try: gitea(f"issues/{n}/comments", "POST", {"body":
|
||||
f"{SUPP_MARKER} — orchestrator `detect_agent` could not find "
|
||||
"`[Claude #N]` or `[Codex #N]` on the first non-empty line.\n\n"
|
||||
"**Comment format contract (P5b 2026-05-20, see RULES)**:\n"
|
||||
"The FIRST non-empty line of EVERY Gitea comment (both Claude and Codex, ALL stages) MUST be:\n"
|
||||
" `[Claude #N] <stage description>`\n"
|
||||
" `[Codex #N] <stage description>`\n\n"
|
||||
"No prefix. No decoration. No banner. No audit anchor before the header.\n"
|
||||
"Decorations (`📌`, `##`, `**`, audit anchor, etc.) go on line 2 or later.\n\n"
|
||||
"This rule OVERRIDES any stage-specific 'body MUST contain ONLY' rule (e.g., COMPACT_PLAN_RULE) — "
|
||||
"those body rules apply AFTER the mandatory first-line agent header.\n\n"
|
||||
f"Supplement count for this stage: {supp_count + 1}/{SUPP_MAX}. "
|
||||
f"At {SUPP_MAX}+ violations the orchestrator will hard-stop this issue."})
|
||||
except: pass
|
||||
continue
|
||||
|
||||
status, target = parse_consensus(last)
|
||||
|
||||
@@ -128,6 +128,76 @@ Addressing [Codex #2] findings ...
|
||||
)
|
||||
assert detect_agent(body_header_first) == "codex"
|
||||
|
||||
# P5b (2026-05-20) — Stage 2 compact-plan first-line conflict regression.
|
||||
# #24 IMP-24 K6: Codex r1~r3 가 첫 줄을 '=== IMPLEMENTATION_UNITS ===' 로 시작 →
|
||||
# detect_agent None → orchestrator silent loop. fix path = comment format strict,
|
||||
# NOT detect_agent 완화 (P0-1 강화 그대로 유지).
|
||||
|
||||
def test_implementation_units_first_line_breaks_detection(self):
|
||||
"""=== IMPLEMENTATION_UNITS === 가 첫 줄이면 detect_agent None (P0-1 strict 정상 동작)."""
|
||||
body = (
|
||||
"=== IMPLEMENTATION_UNITS ===\n"
|
||||
"- id: u1\n"
|
||||
" summary: ...\n"
|
||||
" files:\n"
|
||||
" - docs/architecture/PHASE-Q-AUDIT.md\n"
|
||||
" tests:\n"
|
||||
" - pytest -q tests\n"
|
||||
" estimate_lines: 1\n"
|
||||
"\n"
|
||||
"FINAL_CONSENSUS: YES\n"
|
||||
)
|
||||
assert detect_agent(body) is None, (
|
||||
"=== IMPLEMENTATION_UNITS === as first line MUST cause detect_agent None "
|
||||
"(P0-1 strict). Fix path: enforce agent header first-line in prompt, not relax detect_agent."
|
||||
)
|
||||
|
||||
def test_compact_plan_with_header_first_works(self):
|
||||
"""올바른 Stage 2 compact format: [Codex #N] 첫 줄 → === IMPLEMENTATION_UNITS === 둘째 줄+."""
|
||||
body = (
|
||||
"[Codex #4] Stage 2 simulation-plan review - IMP-24 K6\n"
|
||||
"\n"
|
||||
"=== IMPLEMENTATION_UNITS ===\n"
|
||||
"- id: u1\n"
|
||||
" summary: ...\n"
|
||||
" tests:\n"
|
||||
" - pytest -q tests\n"
|
||||
"\n"
|
||||
"FINAL_CONSENSUS: YES\n"
|
||||
)
|
||||
assert detect_agent(body) == "codex"
|
||||
|
||||
def test_markdown_prefix_breaks_detection(self):
|
||||
"""P5b — `## [Codex #N]` 같은 markdown header prefix 도 detect_agent None.
|
||||
(#21 Stage 4 에서 관찰된 latent silent loop 원인.)"""
|
||||
body_hash = "## [Codex #1] Stage 4 test-verify Round #1\n\nVerdict: PASS\n"
|
||||
body_emoji = "📌 **[Claude #1] Stage 2 plan**\n\nbody\n"
|
||||
body_bold = "**[Codex #1] Stage 4**\n\nbody\n"
|
||||
assert detect_agent(body_hash) is None
|
||||
assert detect_agent(body_emoji) is None
|
||||
assert detect_agent(body_bold) is None
|
||||
|
||||
|
||||
class TestRulesAndCompactPlanFirstLineContract:
|
||||
"""P5b (2026-05-20) — RULES 와 COMPACT_PLAN_RULE 둘 다 first-line agent header
|
||||
rule 을 명시해야 함. wording 검증."""
|
||||
|
||||
def test_rules_has_first_line_strict(self):
|
||||
from orchestrator import RULES
|
||||
# RULES 안에 first-line strict + 모든 stage 적용 명시 있어야 함.
|
||||
assert "FIRST non-empty line" in RULES
|
||||
assert "[Claude #N]" in RULES and "[Codex #N]" in RULES
|
||||
# P5b OVERRIDES 키워드 — body rule 들이 first-line rule 보다 우선하지 않음을 강조
|
||||
assert "OVERRIDES" in RULES or "overrides" in RULES.lower()
|
||||
|
||||
def test_compact_plan_rule_carves_out_first_line(self):
|
||||
from orchestrator import COMPACT_PLAN_RULE
|
||||
# "body" 는 first-line agent header 다음부터 시작한다고 명시
|
||||
assert "FIRST non-empty line" in COMPACT_PLAN_RULE or "first-line agent header" in COMPACT_PLAN_RULE
|
||||
# "after the first-line" 같은 carve-out wording 검증
|
||||
body_lower = COMPACT_PLAN_RULE.lower()
|
||||
assert "after the first" in body_lower or "after the agent header" in body_lower
|
||||
|
||||
|
||||
# ─────────────────────────────────────────────────────────────────
|
||||
# parse_consensus — YES/NO + rewind_target
|
||||
|
||||
Reference in New Issue
Block a user