What I'm excited about in 2026

By Tony Meehan, Co-Founder & CTO

Infrastructure Monitoring Built For AI

We believe Prequel has assembled the world’s largest reliability intelligence library designed for autonomous AI agents. In 2026, we are turning that library into a practical advantage for infrastructure teams, and we are measuring the resulting agent performance in public.

AI is changing the pace and shape of software delivery. It is not just that teams ship faster with copilots and code generation. Teams are also shipping differently. More automation, more agents, more configuration generated on the fly, and more systems that change frequently because it is now cheaper to change them.

That acceleration is exciting, but it has a predictable side effect. The operational surface area expands. The rate of change goes up. The number of unknown unknowns grows. And reliability teams get pulled into a familiar cycle:

  1. New behavior hits production faster than your monitoring evolves
  2. Incidents happen for repeatable reasons
  3. Everyone re-learns the same lessons, often the hard way, then re-implements the same detectors, dashboards, and runbooks again

At Prequel, our mission is to break that loop by turning reliability problems into shared, reusable intelligence. The goal is simple: teams catch common failure modes earlier, with less toil, and with higher confidence.

As we head into 2026, we’re most excited about two things:

  1. Out-of-the-box detections for infrastructure monitoring that meaningfully reduce time-to-value and alert fatigue
  2. Benchmarking our AI SRE agent against competitors using ITBench, the way security used MITRE ATT&CK to turn “we’re better” into something measurable

These two threads connect. The more reliability intelligence we can generate and curate, the more effective our agent becomes. The more effective the agent becomes, the more it can help us scale the creation of reliability intelligence. That flywheel is what I’m building toward this year.

1) Out-of-the-box infra monitoring detections that do not feel like “starter alerts”

Most observability platforms can collect mountains of telemetry. The bottleneck is what comes next:

  • What should I be detecting?
  • How do I detect it precisely, without noise?
  • How do I connect it to impact and root cause quickly?
  • How do I scale that across Kubernetes plus hundreds of common services?

This is where the industry has been stuck. Traditional monitoring still leaves most teams with an uncomfortable choice:

  • Build problem detectors from scratch, which is slow
  • Monitor high-level symptoms and hope you’re fast enough during the incident, which is risky

Scaling reliability intelligence with AI agents, without compromising trust

In 2026, my biggest focus is scaling the creation and use of reliability intelligence with AI agents in a way that does not require a human in the loop for day-to-day progress.

That is important because the long tail is brutal. New versions ship. A chart changes. A dependency changes. A managed service changes behavior. A new Kubernetes controller appears. It is not realistic to keep up by hand, and it is not a good use of senior engineers’ time.

So we are building toward a loop that looks like this:

  1. Discover the environment automatically - Agents inventory services, dependencies, and telemetry surfaces, then identify what is missing or under-instrumented.
  2. Generate CRE-backed detections automatically - Agents deploy Common Reliability Enumerations, including the community context of what the failure mode is, what the signals are, how to confirm it, and what the mitigation typically looks like.
  3. Validate automatically before anything ships - This is where trust comes from. Not from a human reading every rule, but from a repeatable validation harness:
    • Tests against historical telemetry and known incidents
    • Synthetic failure injection and sandbox replay
    • Noise checks and guardrails for blast radius
  4. Deploy and learn continuously - Detections start in a safe mode, collect evidence, and graduate to paging only when confidence and precision thresholds are met. The agent watches outcomes and adapts.

The point is not that humans never look at anything. The point is that progress does not stall waiting for a human review queue. Humans can audit, spot check, and guide high-impact areas. The default should still be autonomous creation, autonomous validation, autonomous rollout.

That is what “scaling reliability intelligence” means to me.

CRE: treating reliability problems as shared problems

A core part of our approach is CRE, Common Reliability Enumerations, an open standard for naming, categorizing, and detecting reliability problems in production systems: https://github.com/prequel-dev/cre

The idea is simple. If security can share knowledge via CVEs, why are we all reinventing reliability intelligence in isolation?

CRE gives teams a consistent vocabulary for cause, impact, and mitigation, and a structure for sharing detection rules and reliability knowledge across environments. In 2026, we’re doubling down on making CRE-powered detections feel like a real advantage for infrastructure monitoring. We do not want “starter templates.” We want useful coverage that maps to the messy realities of modern stacks.

We believe Prequel has assembled the world’s largest reliability intelligence library designed for autonomous AI agents. It is not just a collection of rules. It is structured intelligence: CREs that encode failure modes, detectors across logs and metrics, supporting evidence patterns, and remediation guidance. We built it so agents can consume it directly, compose it, and deploy it safely through automated validation and rollout, without waiting on a human review queue for day-to-day progress.

The crucial shift is that CREs are not just documentation. They are machine-usable building blocks. Agents can generate them, reason over them, compose them, and use them in real workflows.

What “out-of-the-box” should mean in practice

Here’s what I want our out-of-the-box infra monitoring detections to deliver:

  • Immediate coverage across the most common reliability failures in Kubernetes and the services that run there, like datastores, queues, gateways, and control planes
  • High precision by default, with clear context on what triggered, why it matters, and what to do next
  • Correlation-ready outputs so detections do not just fire, they help you answer “what’s related?” and “what changed?”
  • A clean path to customization because every environment is unique, but the failure modes are surprisingly repeatable

This is also where agents help twice. They help us generate detections faster, and they help users apply and tune detections faster.

2) Publicly benchmarking our AI SRE agent with ITBench

A lot of companies, including us, are building “AI for ops.” Almost all the messaging sounds the same:

  • Diagnose incidents faster
  • Reduce toil
  • Automate remediation

The problem is that most of the industry still struggles to answer a basic question:

How do you objectively measure whether an IT agent is actually effective?

That’s why I’m excited about ITBench: https://www.kaggle.com/benchmarks/ibm-research/itbench

Why this matters: a “MITRE moment” for SRE

In cybersecurity, frameworks like MITRE ATT&CK helped shift evaluation from vague claims to measurable capability. You can talk about coverage, techniques, detection quality, and repeatability.

SRE is ready for that same shift.

ITBench will not capture every production complexity. No benchmark does. But it is an important step toward standardized measurement of agent performance on operational tasks.

Why CREs change the game for agents

Our hypothesis is straightforward:

An SRE agent becomes dramatically more effective when it is grounded in high-quality reliability intelligence. That means structured patterns of failure, known mitigations, and context-aware correlations, instead of starting from scratch on every incident.

That is what CRE is for. It is not just a rule library. It is a mechanism for encoding what failures look like and what tends to fix them in a structured and shareable way:
https://github.com/prequel-dev/cre

In practice, this means the agent can:

  • Recognize failure modes earlier based on partial evidence
  • Ask better follow-up questions and run fewer dead-end investigations
  • Propose detections that are consistent with known patterns
  • Provide remediation steps that are grounded in reliability knowledge, not just generic advice

Early results and how we plan to publish them

We’re still early, but our initial internal runs on ITBench-style scenarios are encouraging, especially in areas where AI-generated reliability intelligence helps the agent narrow the search space quickly. That shows up as fewer dead-end tool calls, faster hypothesis convergence, and more consistent diagnoses.

Rather than oversell this, here’s how we plan to be rigorous:

  • We will report task completion and any partial-credit improvements as we iterate
  • We will compare against meaningful baselines, including “vanilla” agent setups
  • We will publish what works, what fails, and what we’re changing, because that’s how this field advances

The 2026 throughline: a reliability intelligence flywheel

My 2026 thesis is that reliability teams will win by building systems that do two things well:

  1. Encode operational knowledge as reusable assets: patterns, mitigations, correlations
  2. Measure effectiveness with benchmarks and repeatable evaluation, not vibes

Out-of-the-box infra detections are how teams feel value immediately. ITBench is how we hold ourselves accountable and make the agent real.

The combination matters. We can scale reliability intelligence with agents, and we can prove the agent works with public benchmarks. That is what I’m looking forward to building this year.

If you’re interested in collaborating

We’re actively partnering with teams who want to:

  • Expand detection coverage across infra monitoring and Kubernetes-heavy environments
  • Stress-test AI SRE agents in realistic scenarios
  • Help shape open, shared reliability standards

If that’s you, reach out. We’re building this in the open, and we want the bar to be measurable.

By clicking Sign Up you're confirming that you agree with our Terms and Conditions.
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.

Break the cycle.

Fix Sev 1s when they're sev 5s.

See how Prequel helps engineering teams
Learn how Prequel helps companies like yours achieving their reliability goals
Get a 1-on-1 walkthrough of the platform
Ask us anything
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
“It finally feels like someone is watching my back.”
Facundo O.
Head of Infrastructure, Leading Cyber Security Company
Sign up for a free trial
Give Prequel a spin. See for yourself
how we can help you level up reliability.
Start your 30-day trial today. Use Prequel for a month on us.
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
“It finally feels like someone is watching my back.”
Facundo Osimi
Head of Infrastructure, Leading Cyber Security Company