Game Modes¶
Single vs multi-agent configurations for Red and Blue teams.
Red Team Mode¶
| Mode | Description |
|---|---|
| single | One agent generates code (default) |
| pipeline | 4-stage chain: Architect → Selector → Generator → Reviewer |
Pipeline (stealthier attacks)¶
Blue Team Mode¶
| Mode | Description |
|---|---|
| single | One agent analyzes (default) |
| ensemble | Specialists (security, compliance, architecture) + consensus. Count auto-matches Red pipeline stages for balanced coordination. |
Ensemble (better detection)¶
Balanced coordination (v2.3)
When facing a 4-stage Red pipeline, the Blue ensemble automatically uses 4 specialists to ensure fair competition. Override with --blue-specialists N.
Consensus Methods¶
| Method | Description |
|---|---|
vote |
Majority wins |
union |
Combine all findings (higher recall, lower precision) |
intersection |
Only findings all agents agree on (higher precision, lower recall) |
debate |
Agents argue until consensus |
Two different \"debate\" options
--consensus-method debate controls how Blue Team ensemble agents agree on findings.
--verification-mode debate controls how the Judge verifies findings (Prosecutor vs Defender).
They are independent and can be used together.
When to Use Multi-Agent¶
Use pipeline + ensemble when you want to stress-test models with stealthier attacks and collaborative detection. Single agents are faster and sufficient for quick experiments.
Verification Mode¶
| Mode | Description |
|---|---|
| standard | Direct matching (default) |
| debate | Adversarial debate (Prosecutor vs Defender) |