Incident Commander.
OpenEnv RL environment where LLM agents learn to be on-call SREs

381 procedural scenarios + 7 hand-curated incident tasks
Curriculum controller: warmup → core → expert tiers
GRPO actor + DeepSeek-R1 critic with context-gated rewards
Real Chaos-Mesh fault injection on a live 5-service k3s cluster
A production-grade RL environment that turns SRE incident response into a curriculum-driven training ground for LLM agents. A FastAPI gym server exposes 7 hand-curated incident tasks plus 381 procedurally generated scenarios, complete with a curriculum controller (warmup → expert tiers), an adversarial LLM scenario designer, a 3-persona judge (junior / senior / principal SRE), and context-gated rewards. Training stack: TRL (GRPO), vLLM rollouts, LoRA r=16/α=32 fine-tuning of Phi-3.5-mini as the actor, DeepSeek-R1 as the critic. Trained across 3 Kaggle shards with a 3-way LoRA merge. The whole live cluster (k3s, Prometheus, Loki, Chaos Mesh, 5 fault-injectable microservices) is provisioned reproducibly with Terraform on Hetzner.

