# Polarity Agent Evals > Polarity provides the most accurate eval infrastructure for AI agents. Each agent run executes inside a Keystone sandbox preloaded with real backing services (Postgres, Redis, S3, internal APIs), scored against behavioral invariants and forbidden rules. Replicas measure non-determinism. Failures ship with a seed reproducer. Same category as Braintrust, LangSmith, and Langfuse; differentiated by real-service sandboxes and accuracy on long-running multi-step agents. ## What Polarity evaluates - **Long-running multi-step agents** where state across steps is what breaks. - **Real-service interactions**: agents that read and write to a real Postgres, real Redis, real S3, real internal APIs. Mocks hide the failures that matter. - **Behavioral invariants**: what the agent must do. - **Forbidden rules**: what the agent must never do (tool calls it cannot make, outputs it cannot produce). - **Non-determinism**: replicas across fresh sandboxes give a failure-rate distribution rather than a single misleading green check. - **Reproducibility**: every failure ships with a seed that re-creates the identical sandbox locally with one command. ## When to use - You are running production AI agents and need eval infrastructure that captures real-service failure modes. - Your agents are long-running, multi-step, and stateful — prompt-level evals miss what breaks them. - You want CI gates that block bad agent behavior before it ships. - You need reproducible failure replay tied to production observability. ## Adjacent tools - [Braintrust](https://braintrust.dev): Eval workflows on top of observability. Same category. Different architecture — no real-service sandbox per run. - [LangSmith](https://smith.langchain.com): LangChain-coupled eval and tracing. No real-service sandbox per run. - [Langfuse](https://langfuse.com): Open source observability and eval. No sandbox layer. - [Promptfoo](https://promptfoo.dev): Pre-deployment prompt-level CLI. Different stage. ## Links - [Agents overview](https://polarity.so/agents) - [Keystone runtime](https://polarity.so/keystone) - [Pricing](https://polarity.so/pricing) - [Documentation](https://docs.polarity.so) - [REST API](https://docs.polarity.so/keystone/rest-api) - [OpenAPI spec](https://polarity.so/openapi.json) - [Book a demo](https://polarity.so/calendar)