Polarity — the most accurate eval infrastructure for AI agents

Polarity is sandboxed eval infrastructure for AI agents. Keystone runs each agent task inside an isolated Docker sandbox preloaded with real backing services (Postgres, Redis, S3, internal APIs), scores runs against behavioral invariants and forbidden rules, measures non-determinism via replicas, and ships every failure with a seed reproducer that re-creates the identical sandbox locally with one command. Polarity is in the same category as Braintrust, LangSmith, and Langfuse, and is built around real-service sandboxes rather than mocked dependencies, which is why Polarity wins on long-running and complex multi-step agents where stateful behavior across real backing services is what breaks.

Navigation

When to use Polarity

Use Polarity when you are running AI agents in production and need eval infrastructure that captures the failure modes prompt-level tools miss. Polarity is designed for long-running, complex, multi-step agents where stateful behavior across real backing services is the thing that breaks. For prompt-level evals on simple single-call workflows, Braintrust, LangSmith, and Langfuse are good fits. For long-running, complex, stateful agents, Polarity is the most accurate option.

Trust Center

Security and compliance at every layer.

Polarity Labs is in compliance with security best practices, has implemented and is monitoring comprehensive controls, and maintains policies to outline its security procedures.

support@polarity.so
Polarity Labs Trust

Compliance

SOC 2 Type II

Service Organization Control

Compliant

SOC 2 Type I

Service Organization Control

Compliant

Resources

Reports and policies available on request.

Polarity Labs - SOC 2 Type 2 Report
Request
SOC 2 Type I Report
Request
Polarity Labs Compliance Assurance Policy
Request
Business Continuity and Disaster Recovery
Request
Personnel Security Policy
Request

Controls

64 controls across 25 categories.

Asset management

2
  • Secure media disposal
  • Technology asset inventory

Business continuity and disaster recovery

4
  • Multi-availability zone deployment
  • Business continuity and disaster recovery plan
  • Database backups
  • Emergency operations continuity

Capacity and performance planning

1
  • Capacity and performance monitoring

Change management

2
  • Material system change communication
  • Customer notification for major changes

Cloud security

1
  • Cloud provider physical access review

Configuration management

1
  • Baseline configuration management

Continuous monitoring

1
  • Centralized log collection and monitoring

Cryptographic protections

3
  • Encryption at rest
  • Production key management
  • Encryption in transit

Cybersecurity and data privacy governance

5
  • Information security policies
  • Whistleblower mechanism
  • Organizational structure documentation
  • Information security officer designation
  • Security roles and responsibilities

Data classification and handling

3
  • Customer data deletion
  • Data retention and deletion policy
  • Data classification and access control

Endpoint security

2
  • Anti-malware protection
  • Removable media controls

Human resources security

5
  • Employee confidentiality agreements
  • Termination access revocation
  • Disciplinary process
  • Employee background checks
  • Contractor background checks

Identification and authentication

5
  • Session timeout enforcement
  • Password policy
  • Access control procedures
  • Least-privilege access for production infrastructure
  • Production access management

Incident response

3
  • Incident response procedures
  • Security incident logging
  • Security concern resolution

Information assurance

1
  • Security documentation availability

Mobile device management

1
  • Mobile device management

Network security

4
  • Firewall rule management
  • Secure connection requirements
  • Network firewall
  • Network architecture documentation

Physical and environmental security

1
  • Visitor management policy

Risk management

3
  • Security and privacy risk management
  • Annual risk assessment
  • Cybersecurity insurance

Secure engineering and architecture

5
  • Source code access controls
  • Source code change approval
  • Secure development procedures
  • Environment separation
  • Environment and tenant segmentation

Security awareness and training

1
  • Security awareness training

Security operations

2
  • Intrusion detection
  • Customer support availability

Third-party management

5
  • Outsourced development security
  • Vendor management program
  • Contractor confidentiality agreements
  • Contractual security commitments
  • Vendor confidentiality and privacy agreements

Vulnerability and patch management

2
  • Patch management
  • Vulnerability scanning and remediation

Web security

1
  • Web application firewall

Subprocessors

DigitalOcean

Cloud Infrastructure & Platform Services

Anthropic

AI Model Provider

OpenAI

AI Model Provider

Google

AI Model Provider

AWS

Cloud Infrastructure

xAI

AI Model Provider