RL / Systems2026Live

Incident Commander.

OpenEnv RL environment where LLM agents learn to be on-call SREs

view code live demo

highlights

381 procedural scenarios + 7 hand-curated incident tasks

Curriculum controller: warmup → core → expert tiers

GRPO actor + DeepSeek-R1 critic with context-gated rewards

Real Chaos-Mesh fault injection on a live 5-service k3s cluster

the work

A production-grade RL environment that turns SRE incident response into a curriculum-driven training ground for LLM agents. A FastAPI gym server exposes 7 hand-curated incident tasks plus 381 procedurally generated scenarios, complete with a curriculum controller (warmup → expert tiers), an adversarial LLM scenario designer, a 3-persona judge (junior / senior / principal SRE), and context-gated rewards. Training stack: TRL (GRPO), vLLM rollouts, LoRA r=16/α=32 fine-tuning of Phi-3.5-mini as the actor, DeepSeek-R1 as the critic. Trained across 3 Kaggle shards with a 3-way LoRA merge. The whole live cluster (k3s, Prometheus, Loki, Chaos Mesh, 5 fault-injectable microservices) is provisioned reproducibly with Terraform on Hetzner.

stack

PyTorchTRLGRPO/PPOvLLMLoRAFastAPIKubernetesTerraformHetznerChaos MeshPrometheusLoki

More work

all →

Full-Stack

Credivo

Agentic AI

Doc AI

AI / ML

Incident Commander.

More work

Credivo

Doc AI

MindSync