Solo.io Introduces the agentevals Open Source Project to Bridge the Production Reliability Gap for Agentic AI

solo.io-introduces-the-agentevals-open-source-project-to-bridge-the-production-reliability-gap-for-agentic-ai
Solo.io Introduces the agentevals Open Source Project to Bridge the Production Reliability Gap for Agentic AI

Solo will also contribute agentregistry to CNCF to increase the velocity of its popular and growing suite of AI solutions and projects

AMSTERDAM, March 25, 2026 (GLOBE NEWSWIRE) — KubeCon + CloudNativeCon Europe — Solo.io, the leading provider of cloud native connectivity and AI-ready infrastructure, today announced two milestones: the launch of agentevals, a new open source project that enables teams to instrument, evaluate, and benchmark agentic AI behavior for quality and reliability across any model or framework, and the contribution of the open source project agentregistry to the Cloud Native Computing Foundation (CNCF) to fill a critical governance gap in agentic infrastructure. Together, these announcements support Solo.io’s ongoing mission to advance the state of agentic infrastructure in the cloud native ecosystem to keep pace with rapidly evolving AI innovation, allowing organizations to deploy AI to production confidently and responsibly.

Introducing agentevals: Evaluation Across the Entire Agentic Loop
AI agents reason, plan, and make decisions that vary run to run, which is what makes them so powerful. But it means the enterprise tooling built around deterministic software and traditional LLM evaluation doesn’t translate to AI agents executing in an agentic loop: comparing inputs to a given set of outputs is necessary but not sufficient to assess the quality and reliability of agentic behavior. What enterprises need is an evaluation framework for defining what good agent behavior looks like, measuring against it continuously, and detecting when behavioral drifts create negative or suboptimal outcomes. 

The agentevals project solves this by treating the agentic loop the way observability platforms treat distributed systems. It leverages OpenTelemetry, the industry standard for distributed tracing, to capture and correlate individual invocations from distributed agentic interactions, then scores them against golden eval sets using an extensible evaluation engine. This means teams can continuously validate how an agent behaves end-to-end, across models, frameworks, and tool configurations, with no requirement for agent reruns and without added infrastructure debt.

“Evaluation is the biggest unsolved problem in agentic infrastructure today. Organizations have frameworks for building agents, gateways for connecting them, and registries for governing them, but no consistent way to know whether an agent is actually reliable enough to trust in production,” said Idit Levine, Founder and CEO, Solo.io. “The agentevals project is much more than a new tool or framework, it’s a new category of agentic infrastructure built by and for the community to improve the reliability and trust of agentic workloads.

Key capabilities include:

  • Offline and online evaluation modes: Evaluate agents from recorded traces offline, or stream live OpenTelemetry data for real-time analysis. Record live runs via OpenTelemetry to build golden data sets for ongoing regression testing.
  • Zero-code and SDK-based integration: Point any OTel-instrumented agent at the agentevals receiver with zero code changes, or use the SDK for programmatic session lifecycle and fine-grained control. Works with any model and any framework that emits OpenTelemetry spans.
  • Built-in evaluator catalog: Ships with out-of-the-box evaluators for trajectory matching (strict, unordered, subset, and superset modes), LLM-as-judge scoring, response quality, and tool coverage ready to use immediately.
  • Community evaluator registry: Create and contribute custom evaluators to the project’s shared registry, building a growing catalog of scoring logic the entire community benefits from.
  • Golden eval sets: Define what “good” looks like for specific workflows. agentevals tests against those benchmarks continuously, flagging regressions when models are swapped, tools are added, or prompts change.
  • Multi-interface access: Includes a CLI for local development and CI/CD pipelines, a web UI for visual inspection and eval set creation, and an MCP server enabling Claude Code to run evaluations and inspect live agent sessions directly from a conversation.

agentregistry Joins kagent and agentgateway Under Open Governance
Introduced by Solo.io in November 2025, agentregistry is an AI-native open source registry for AI agents, MCP tools, and Agent Skills. It enables teams to standardize how AI capabilities are catalogued, discovered, and governed across the enterprise. Contributing agentregistry to the CNCF enables the next stage of community growth and contribution under an open governance model for all contributors.

The first challenge every organization faces when adopting agentic AI is establishing control. Agents, MCP tools, prompts, and agent skills proliferate across teams, registries, and developer laptops, with no centralized visibility or governance. The agentregistry project solves this by providing a centralized, governed registry where artifacts are imported from any source, including public registries, private repos, or existing deployments. Developers gain self-service through UI, CLI, and API-based catalog while enabling agentic IDEs and autonomous agents via semantic search through built-in MCP tools. 

The role of agentregistry extends well beyond the catalog, integrating with multiple runtime platforms, including Kubernetes, AWS AgentCore, and Google Vertex AI, enabling teams to deploy agents, MCP servers, and skills directly from the registry to whatever platform they choose. Runtime discovery in these platforms detects shadow inventory by scanning connected runtimes to surface agents and tools that have been deployed independently, outside any governed workflow. Agentregistry integrates with agentgateway to provide consistent, uniform security controls and observability for agentic connectivity across runtimes.

The Open Source Foundation for Enterprises Running AI in Production
Solo.io is advancing the state of the art of agentic infrastructure, enabling organizations to fill critical production gaps required to run AI responsibly in production. With today’s announcements, that suite now consists of four essential AI layers:

  • kagent agentic framework: CNCF Sandbox project for building and running AI agents natively in Kubernetes.
  • agentgateway AI gateway: housed in the Linux Foundation and the first data plane built from the ground up for AI agents, with full MCP and A2A support.
  • agentregistry registry and discovery: A centralized, vendor-neutral registry for AI applications and artifacts, the governance and discoverability layer for scaling AI capabilities.
  • agentevals evaluation and reliability: Continuously scores agent behavior against defined benchmarks, across any model or framework, from existing observability data.

Solo.io’s full suite of AI solutions will be on display at KubeCon + CloudNativeCon Europe 2026 at booth 800.

Get Involved
To join the community and get started:

About Solo.io
Solo.io is reimagining infrastructure for cloud and AI, uniting secure, seamless cloud connectivity with AI-ready, agentic infrastructure. Trusted by leading enterprises worldwide, Solo.io helps organizations securely connect applications, services, and AI workloads across any environment. From AI infrastructure to API gateways and service mesh, our solutions simplify and unify application networking, enabling teams to accelerate innovation, scale intelligently, and leverage the full potential of modern AI agents. Learn more at www.solo.io.

Media Contact
Jessie Adams-Shore
Solopr@speakeasystrategies.com