In March 2026, Grafana Labs archived Grafana OnCall.
No fanfare. A GitHub issue, a deprecation notice, and a gentle suggestion to migrate to Grafana IRM — their new, fully managed, not-free-anymore offering. Thousands of engineering teams who had chosen OnCall specifically because it was self-hosted and open-source were left with a choice: move to SaaS, or figure something else out.
We had been running OnCall internally for over a year. The morning that issue went live, I opened our instance and started thinking about what "figuring something else out" actually looked like.
There was no good answer.
The options were all bad
PagerDuty is the obvious choice — mature, battle-tested, trusted by half the Fortune 500. It's also $21/user/month at minimum, which for a 20-person engineering org is $5,040 a year before you've added a single integration. For a regulated industry where every engineer might be in the on-call rotation, the number climbs fast. And it's SaaS-only — your incident data lives on their infrastructure, not yours.
incident.io is well-designed and growing fast. Also SaaS-only. Also per-seat.
OpsGenie — now Atlassian. If you know, you know.
The open-source options were sparse. There were some community forks of OnCall starting to appear, but nothing production-ready. Nothing with a clear maintenance commitment. Nothing that was going to be around in two years.
The more I looked, the clearer the gap became: every serious option assumed you were willing to hand your incident data to a third party and pay per engineer. For self-hosted teams, regulated industries, and cost-conscious orgs, there was simply nothing.
The AI angle made it worse
Here's what frustrated me most about every tool I evaluated: the AI features were cosmetic.
A Copilot button that drafts a Slack message. An auto-summary that reads the timeline back to you. A chatbot you can ask questions. These are useful in the same way that spell-check is useful — they save you a few minutes. They don't change the shape of the work.
The underlying architecture of every incident management tool I looked at was designed in 2015 for a world where humans are the only actors. The API is designed for human sessions. The permission model assumes a human is taking action. The data model records events as things humans did.
AI in these tools is a layer on top. A veneer.
We wanted to build something different. We wanted to build a platform where an AI agent can triage an incident, call your observability stack, execute a runbook, and loop a human in only when genuinely needed — and where every one of those actions is logged, auditable, and visible in the same timeline where you'd see a human doing the same thing.
Not AI bolted on. AI as a first-class actor.
What we built
Regen is our answer to the gap.
It's fully open-source (AGPLv3), fully self-hosted, and designed from day one for a world where AI agents and humans share the same operational layer. Your data stays on your infrastructure. You're not paying per seat. And when the AI triage agent eventually tells you it's confident this is a Redis eviction and shows you the matching runbook — you approved it or didn't, it logged what it did either way, and you can read the full trace of what it found.
The first release covers everything Grafana OnCall had: schedules, rotations, escalation policies, Slack and Teams integration, webhook ingestion from Prometheus, Grafana, Datadog, and CloudWatch. One-click migration from Grafana OnCall. Helm chart for Kubernetes. SAML SSO, free, no enterprise gating.
The AI triage agent is coming in the next few releases. We wanted to ship the foundation right first.
On being self-hosted and open-source
This is a deliberate choice, not a temporary go-to-market strategy.
Self-hosted means your incident data — your runbooks, your timelines, your engineer notes, your post-mortems — stays in your infrastructure and accumulates there. After a year on Regen, your triage agent knows your stack better than most of your engineers. That knowledge doesn't belong to us. It belongs to you.
Open-source means you can audit every line of code that touches your production incidents. For a tool that gets paged at 3am and has the authority to execute runbooks on your infrastructure, "trust us" is not an acceptable answer.
We believe the right model for a reliability platform is open core: the community edition is complete and production-ready, not a crippled lead-gen tool. Enterprise adds compliance and governance tooling. SSO is free forever — gating SSO is user-hostile.
The name
Regen means regeneration — systems healing themselves. It also means rain in German, which felt right for something designed to operate in the middle of the night when nobody wants to be awake.
The mascot is an axolotl. Axolotls can regenerate lost limbs, their hearts, parts of their brains. They are also, objectively, extremely good.
What's next
We're building toward a world where your on-call engineer's involvement in a known, well-understood incident is a single tap: approve the runbook the agent already staged. Everything else — triage, investigation, execution, post-mortem draft — handled.
For novel incidents, first occurrences, or anything with a blast radius the agent isn't confident about: the human is in the loop. Always. That's not a limitation of the technology. It's a design principle.
If you're migrating from Grafana OnCall, the guide is here. If you want to run it yourself, installation takes under five minutes. And if you want to talk about what Regen Pro will look like for your team, we're easy to reach.
We built this because the gap was real and nobody else was filling it. We're glad it exists.
— Inder & Yatharth