Skip to main content

Set up CI/CD regression gates with EvalGate

EvalGate has three CI paths. Choose the one that matches the evidence you want to gate on.
PathAccount requiredCommandBlocks on
Local gateNonpx @evalgate/sdk gateExisting test/eval script compared with evals/baseline.json
Eval artifact CINonpx @evalgate/sdk ciEval spec failures and optional base/head run diff
Platform gateYesnpx @evalgate/sdk checkPlatform score, baseline drop, policies, budgets, and judge credibility

One-command eval artifact CI

Use ci when your repo contains EvalGate spec files and you want each pull request to write .evalgate/ run artifacts.
.github/workflows/evalgate.yml
name: EvalGate CI
on: [push, pull_request]
jobs:
  evalgate:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
      - run: npm ci
      - run: npx @evalgate/sdk ci --format github --write-results --base main
      - uses: actions/upload-artifact@v4
        if: always()
        with:
          name: evalgate-results
          path: .evalgate/
That command:
  • Discovers evaluation specs automatically
  • Runs all specs by default
  • Writes run artifacts when --write-results is set
  • Compares against the base branch when --base is set
  • Writes a GitHub step summary and annotations when --format github is set
  • Exits with the appropriate code (see exit codes)
Add --impacted-only to run only specs affected by the current diff. Impact analysis requires a base branch, so use it with --base main or another base ref.

Local gate setup

Use the local gate when you want the fastest no-account path. Run init once:
npx @evalgate/sdk init
This detects your package manager, runs your existing test script to create evals/baseline.json, creates evalgate.config.json, and writes .github/workflows/evalgate-gate.yml. Commit the generated files:
git add evals/ .github/workflows/evalgate-gate.yml evalgate.config.json
git commit -m "chore: add EvalGate regression gate"
git push
The generated workflow uses the local gate command:
.github/workflows/evalgate-gate.yml
name: EvalGate Gate
on:
  pull_request:
    branches: [main]
jobs:
  regression-gate:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
        with:
          node-version: '20'
          cache: npm
      - run: npm ci
      - name: EvalGate regression gate
        run: npx -y @evalgate/sdk@^3 gate --format github
      - uses: actions/upload-artifact@v4
        if: always()
        with:
          name: evalgate-report
          path: evals/regression-report.json
          if-no-files-found: ignore
The built-in local gate compares test pass state and parsed test count against the baseline. If your test script runs AI evals, this gates AI behavior. Otherwise, it gates test health until you add eval specs or a custom eval:regression-gate script.

Platform gate

Use check when you want the dashboard, quality scores, judge credibility, and import-on-failure behavior.
.github/workflows/evalgate-platform-gate.yml
name: EvalGate Platform Gate
on:
  pull_request:
    branches: [main]
jobs:
  eval-gate:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
        with:
          node-version: '20'
      - run: npm ci
      - name: EvalGate quality gate
        env:
          EVALGATE_API_KEY: ${{ secrets.EVALGATE_API_KEY }}
        run: npx @evalgate/sdk check --format github --onFail import
check --format github emits GitHub annotations for failed cases and writes a step summary with the verdict, score, baseline score, delta, and top failing cases. --onFail import uploads the failed run context to the platform so you can review it in the dashboard.

GitLab CI configuration

For GitLab, use JSON output and upload the report as an artifact:
.gitlab-ci.yml
eval-gate:
  stage: test
  image: node:20
  script:
    - npm ci
    - npx @evalgate/sdk gate --format json
  artifacts:
    when: always
    paths:
      - evals/regression-report.json
  only:
    - merge_requests
    - main

Setting quality thresholds

Define local gate thresholds in evalgate.config.json. The built-in local gate uses evals/baseline.json; platform gates use the quality and judge settings configured for the evaluation.
evalgate.config.json
{
  "gate": {
    "baseline": "evals/baseline.json",
    "report": "evals/regression-report.json"
  },
  "judge": {
    "tprMin": 0.7,
    "tnrMin": 0.7,
    "minLabeledSamples": 30
  }
}
Store evalgate.config.json and evals/baseline.json in version control. Changing thresholds or baselines without review changes what your CI gate proves.

Exit codes

CodeMeaning
0Clean: no regressions detected
1Regression: tests failed, specs failed, or score dropped below the selected gate
2Configuration or infrastructure issue
8Warning state for weak judge discriminative power in supported platform gate paths

CLI commands reference

npx @evalgate/sdk init
npx @evalgate/sdk verify
npx @evalgate/sdk doctor

Best practices

Start local

Use the local gate first so every repo gets a baseline before the team adopts platform workflows.

Upload artifacts

Keep .evalgate/ or evals/regression-report.json as CI artifacts so failures are debuggable after the job exits.

Use impacted-only deliberately

Add --impacted-only only after your spec manifest is stable. Unknown changes fall back to broader runs.

Promote reviewed cases

Treat generated candidate cases as drafts until a human or promotion workflow approves them for gating.