Set up CI/CD regression gates with EvalGate
EvalGate has three CI paths. Choose the one that matches the evidence you want to gate on.| Path | Account required | Command | Blocks on |
|---|---|---|---|
| Local gate | No | npx @evalgate/sdk gate | Existing test/eval script compared with evals/baseline.json |
| Eval artifact CI | No | npx @evalgate/sdk ci | Eval spec failures and optional base/head run diff |
| Platform gate | Yes | npx @evalgate/sdk check | Platform score, baseline drop, policies, budgets, and judge credibility |
One-command eval artifact CI
Useci when your repo contains EvalGate spec files and you want each pull request to write .evalgate/ run artifacts.
.github/workflows/evalgate.yml
- Discovers evaluation specs automatically
- Runs all specs by default
- Writes run artifacts when
--write-resultsis set - Compares against the base branch when
--baseis set - Writes a GitHub step summary and annotations when
--format githubis set - Exits with the appropriate code (see exit codes)
Add
--impacted-only to run only specs affected by the current diff. Impact analysis requires a base branch, so use it with --base main or another base ref.Local gate setup
Use the local gate when you want the fastest no-account path. Runinit once:
evals/baseline.json, creates evalgate.config.json, and writes .github/workflows/evalgate-gate.yml. Commit the generated files:
gate command:
.github/workflows/evalgate-gate.yml
The built-in local gate compares test pass state and parsed test count against the baseline. If your test script runs AI evals, this gates AI behavior. Otherwise, it gates test health until you add eval specs or a custom
eval:regression-gate script.Platform gate
Usecheck when you want the dashboard, quality scores, judge credibility, and import-on-failure behavior.
.github/workflows/evalgate-platform-gate.yml
check --format github emits GitHub annotations for failed cases and writes a step summary with the verdict, score, baseline score, delta, and top failing cases. --onFail import uploads the failed run context to the platform so you can review it in the dashboard.
GitLab CI configuration
For GitLab, use JSON output and upload the report as an artifact:.gitlab-ci.yml
Setting quality thresholds
Define local gate thresholds inevalgate.config.json. The built-in local gate uses evals/baseline.json; platform gates use the quality and judge settings configured for the evaluation.
evalgate.config.json
Exit codes
| Code | Meaning |
|---|---|
0 | Clean: no regressions detected |
1 | Regression: tests failed, specs failed, or score dropped below the selected gate |
2 | Configuration or infrastructure issue |
8 | Warning state for weak judge discriminative power in supported platform gate paths |
CLI commands reference
Best practices
Start local
Use the local gate first so every repo gets a baseline before the team adopts platform workflows.
Upload artifacts
Keep
.evalgate/ or evals/regression-report.json as CI artifacts so failures are debuggable after the job exits.Use impacted-only deliberately
Add
--impacted-only only after your spec manifest is stable. Unknown changes fall back to broader runs.Promote reviewed cases
Treat generated candidate cases as drafts until a human or promotion workflow approves them for gating.