What I'm excited about in 2026

By Tony Meehan, Co-Founder & CTO
Infrastructure Monitoring Built For AI
We believe Prequel has assembled the world’s largest reliability intelligence library designed for autonomous AI agents. In 2026, we are turning that library into a practical advantage for infrastructure teams, and we are measuring the resulting agent performance in public.
AI is changing the pace and shape of software delivery. It is not just that teams ship faster with copilots and code generation. Teams are also shipping differently. More automation, more agents, more configuration generated on the fly, and more systems that change frequently because it is now cheaper to change them.
That acceleration is exciting, but it has a predictable side effect. The operational surface area expands. The rate of change goes up. The number of unknown unknowns grows. And reliability teams get pulled into a familiar cycle:
- New behavior hits production faster than your monitoring evolves
- Incidents happen for repeatable reasons
- Everyone re-learns the same lessons, often the hard way, then re-implements the same detectors, dashboards, and runbooks again
At Prequel, our mission is to break that loop by turning reliability problems into shared, reusable intelligence. The goal is simple: teams catch common failure modes earlier, with less toil, and with higher confidence.
As we head into 2026, we’re most excited about two things:
- Out-of-the-box detections for infrastructure monitoring that meaningfully reduce time-to-value and alert fatigue
- Benchmarking our AI SRE agent against competitors using ITBench, the way security used MITRE ATT&CK to turn “we’re better” into something measurable
These two threads connect. The more reliability intelligence we can generate and curate, the more effective our agent becomes. The more effective the agent becomes, the more it can help us scale the creation of reliability intelligence. That flywheel is what I’m building toward this year.
1) Out-of-the-box infra monitoring detections that do not feel like “starter alerts”
Most observability platforms can collect mountains of telemetry. The bottleneck is what comes next:
- What should I be detecting?
- How do I detect it precisely, without noise?
- How do I connect it to impact and root cause quickly?
- How do I scale that across Kubernetes plus hundreds of common services?
This is where the industry has been stuck. Traditional monitoring still leaves most teams with an uncomfortable choice:
- Build problem detectors from scratch, which is slow
- Monitor high-level symptoms and hope you’re fast enough during the incident, which is risky
Scaling reliability intelligence with AI agents, without compromising trust
In 2026, my biggest focus is scaling the creation and use of reliability intelligence with AI agents in a way that does not require a human in the loop for day-to-day progress.
That is important because the long tail is brutal. New versions ship. A chart changes. A dependency changes. A managed service changes behavior. A new Kubernetes controller appears. It is not realistic to keep up by hand, and it is not a good use of senior engineers’ time.
So we are building toward a loop that looks like this:
- Discover the environment automatically - Agents inventory services, dependencies, and telemetry surfaces, then identify what is missing or under-instrumented.
- Generate CRE-backed detections automatically - Agents deploy Common Reliability Enumerations, including the community context of what the failure mode is, what the signals are, how to confirm it, and what the mitigation typically looks like.
- Validate automatically before anything ships - This is where trust comes from. Not from a human reading every rule, but from a repeatable validation harness:
- Tests against historical telemetry and known incidents
- Synthetic failure injection and sandbox replay
- Noise checks and guardrails for blast radius
- Deploy and learn continuously - Detections start in a safe mode, collect evidence, and graduate to paging only when confidence and precision thresholds are met. The agent watches outcomes and adapts.
The point is not that humans never look at anything. The point is that progress does not stall waiting for a human review queue. Humans can audit, spot check, and guide high-impact areas. The default should still be autonomous creation, autonomous validation, autonomous rollout.
That is what “scaling reliability intelligence” means to me.
CRE: treating reliability problems as shared problems
A core part of our approach is CRE, Common Reliability Enumerations, an open standard for naming, categorizing, and detecting reliability problems in production systems: https://github.com/prequel-dev/cre
The idea is simple. If security can share knowledge via CVEs, why are we all reinventing reliability intelligence in isolation?
CRE gives teams a consistent vocabulary for cause, impact, and mitigation, and a structure for sharing detection rules and reliability knowledge across environments. In 2026, we’re doubling down on making CRE-powered detections feel like a real advantage for infrastructure monitoring. We do not want “starter templates.” We want useful coverage that maps to the messy realities of modern stacks.
We believe Prequel has assembled the world’s largest reliability intelligence library designed for autonomous AI agents. It is not just a collection of rules. It is structured intelligence: CREs that encode failure modes, detectors across logs and metrics, supporting evidence patterns, and remediation guidance. We built it so agents can consume it directly, compose it, and deploy it safely through automated validation and rollout, without waiting on a human review queue for day-to-day progress.
The crucial shift is that CREs are not just documentation. They are machine-usable building blocks. Agents can generate them, reason over them, compose them, and use them in real workflows.
What “out-of-the-box” should mean in practice
Here’s what I want our out-of-the-box infra monitoring detections to deliver:
- Immediate coverage across the most common reliability failures in Kubernetes and the services that run there, like datastores, queues, gateways, and control planes
- High precision by default, with clear context on what triggered, why it matters, and what to do next
- Correlation-ready outputs so detections do not just fire, they help you answer “what’s related?” and “what changed?”
- A clean path to customization because every environment is unique, but the failure modes are surprisingly repeatable
This is also where agents help twice. They help us generate detections faster, and they help users apply and tune detections faster.
2) Publicly benchmarking our AI SRE agent with ITBench
A lot of companies, including us, are building “AI for ops.” Almost all the messaging sounds the same:
- Diagnose incidents faster
- Reduce toil
- Automate remediation
The problem is that most of the industry still struggles to answer a basic question:
How do you objectively measure whether an IT agent is actually effective?
That’s why I’m excited about ITBench: https://www.kaggle.com/benchmarks/ibm-research/itbench
Why this matters: a “MITRE moment” for SRE
In cybersecurity, frameworks like MITRE ATT&CK helped shift evaluation from vague claims to measurable capability. You can talk about coverage, techniques, detection quality, and repeatability.
SRE is ready for that same shift.
ITBench will not capture every production complexity. No benchmark does. But it is an important step toward standardized measurement of agent performance on operational tasks.
Why CREs change the game for agents
Our hypothesis is straightforward:
An SRE agent becomes dramatically more effective when it is grounded in high-quality reliability intelligence. That means structured patterns of failure, known mitigations, and context-aware correlations, instead of starting from scratch on every incident.
That is what CRE is for. It is not just a rule library. It is a mechanism for encoding what failures look like and what tends to fix them in a structured and shareable way:
https://github.com/prequel-dev/cre
In practice, this means the agent can:
- Recognize failure modes earlier based on partial evidence
- Ask better follow-up questions and run fewer dead-end investigations
- Propose detections that are consistent with known patterns
- Provide remediation steps that are grounded in reliability knowledge, not just generic advice
Early results and how we plan to publish them
We’re still early, but our initial internal runs on ITBench-style scenarios are encouraging, especially in areas where AI-generated reliability intelligence helps the agent narrow the search space quickly. That shows up as fewer dead-end tool calls, faster hypothesis convergence, and more consistent diagnoses.
Rather than oversell this, here’s how we plan to be rigorous:
- We will report task completion and any partial-credit improvements as we iterate
- We will compare against meaningful baselines, including “vanilla” agent setups
- We will publish what works, what fails, and what we’re changing, because that’s how this field advances
The 2026 throughline: a reliability intelligence flywheel
My 2026 thesis is that reliability teams will win by building systems that do two things well:
- Encode operational knowledge as reusable assets: patterns, mitigations, correlations
- Measure effectiveness with benchmarks and repeatable evaluation, not vibes
Out-of-the-box infra detections are how teams feel value immediately. ITBench is how we hold ourselves accountable and make the agent real.
The combination matters. We can scale reliability intelligence with agents, and we can prove the agent works with public benchmarks. That is what I’m looking forward to building this year.
If you’re interested in collaborating
We’re actively partnering with teams who want to:
- Expand detection coverage across infra monitoring and Kubernetes-heavy environments
- Stress-test AI SRE agents in realistic scenarios
- Help shape open, shared reliability standards
If that’s you, reach out. We’re building this in the open, and we want the bar to be measurable.

