Skip to main content

EvalGate: CI for AI behavior

EvalGate helps AI teams stop the same failure from shipping twice. Start with a local regression gate, then add traces, evaluation history, LLM judges, reviews, cost controls, and governance when your AI system reaches production scale. The product is one operating loop: trace to eval to gate. Real behavior produces evidence. Reviewed failures become test cases. Promoted test cases become CI gates that give reviewers evidence before a prompt, model, retriever, or agent change ships.

Quick start

Set up your first local eval gate in under 5 minutes, with no account required.

Trace to eval to gate

Understand the operating loop before you wire in platform features.

SDK and CLI

Install the TypeScript or Python SDK and use the same assertions locally, in app code, and in CI.

API reference

Integrate directly with the EvalGate platform for traces, runs, projects, and keys.

The adoption path

1

Start with one local gate

Install the SDK, snapshot your current test/eval health, and add a CI step that fails when the baseline regresses. This proves the workflow before anyone has to adopt another dashboard.
2

Capture real failures

Add tracing when local gates are not enough. EvalGate captures production and staging behavior with inputs, outputs, tool calls, latency, token usage, cost, and metadata.
3

Promote failures into coverage

Convert repeated or high-risk failures into reusable eval cases. Label, cluster, synthesize, and review cases so your suites track actual user pain.
4

Gate releases with evidence

Run the eval suite in CI, compare against the baseline, and give reviewers clear pass/fail evidence before changes merge.

What to use first

StageUse thisOutcome
First repoLocal gateBlock regressions without creating an account
Production AI featureTraces and eval runsTurn real behavior into coverage
Team rolloutReviews, judges, and PR annotationsMake AI quality reviewable
Governed rolloutCosts, benchmarks, annotations, and audit historyTrack quality, spend, and release evidence across projects

Explore next

CI/CD integration

Wire EvalGate into GitHub Actions or GitLab CI to gate every PR.

Tracing setup

Capture the real AI behavior that should become eval coverage.

LLM judge

Add judge-backed scoring when assertions alone are not enough.

Agent governance

Scale from one gate to governed AI release workflows.