Set up CI/CD regression gates with EvalGate

EvalGate has three CI paths. Choose the one that matches the evidence you want to gate on.

Path	Account required	Command	Blocks on
Local gate	No	`npx @evalgate/sdk gate`	Existing test/eval script compared with `evals/baseline.json`
Eval artifact CI	No	`npx @evalgate/sdk ci`	Eval spec failures and optional base/head run diff
Platform gate	Yes	`npx @evalgate/sdk check`	Platform score, baseline drop, policies, budgets, and judge credibility

One-command eval artifact CI

Use ci when your repo contains EvalGate spec files and you want each pull request to write .evalgate/ run artifacts.

.github/workflows/evalgate.yml

name: EvalGate CI
on: [push, pull_request]
jobs:
  evalgate:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
      - run: npm ci
      - run: npx @evalgate/sdk ci --format github --write-results --base main
      - uses: actions/upload-artifact@v4
        if: always()
        with:
          name: evalgate-results
          path: .evalgate/

That command:

Discovers evaluation specs automatically
Runs all specs by default
Writes run artifacts when --write-results is set
Compares against the base branch when --base is set
Writes a GitHub step summary and annotations when --format github is set
Exits with the appropriate code (see exit codes)

Add --impacted-only to run only specs affected by the current diff. Impact analysis requires a base branch, so use it with --base main or another base ref.

Local gate setup

Use the local gate when you want the fastest no-account path. Run init once:

npx @evalgate/sdk init

This detects your package manager, runs your existing test script to create evals/baseline.json, creates evalgate.config.json, and writes .github/workflows/evalgate-gate.yml. Commit the generated files:

git add evals/ .github/workflows/evalgate-gate.yml evalgate.config.json
git commit -m "chore: add EvalGate regression gate"
git push

The generated workflow uses the local gate command:

.github/workflows/evalgate-gate.yml

name: EvalGate Gate
on:
  pull_request:
    branches: [main]
jobs:
  regression-gate:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
        with:
          node-version: '20'
          cache: npm
      - run: npm ci
      - name: EvalGate regression gate
        run: npx -y @evalgate/sdk@^3 gate --format github
      - uses: actions/upload-artifact@v4
        if: always()
        with:
          name: evalgate-report
          path: evals/regression-report.json
          if-no-files-found: ignore

The built-in local gate compares test pass state and parsed test count against the baseline. If your test script runs AI evals, this gates AI behavior. Otherwise, it gates test health until you add eval specs or a custom eval:regression-gate script.

Platform gate

Use check when you want the dashboard, quality scores, judge credibility, and import-on-failure behavior.

.github/workflows/evalgate-platform-gate.yml

name: EvalGate Platform Gate
on:
  pull_request:
    branches: [main]
jobs:
  eval-gate:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
        with:
          node-version: '20'
      - run: npm ci
      - name: EvalGate quality gate
        env:
          EVALGATE_API_KEY: ${{ secrets.EVALGATE_API_KEY }}
        run: npx @evalgate/sdk check --format github --onFail import

check --format github emits GitHub annotations for failed cases and writes a step summary with the verdict, score, baseline score, delta, and top failing cases. --onFail import uploads the failed run context to the platform so you can review it in the dashboard.

GitLab CI configuration

For GitLab, use JSON output and upload the report as an artifact:

.gitlab-ci.yml

eval-gate:
  stage: test
  image: node:20
  script:
    - npm ci
    - npx @evalgate/sdk gate --format json
  artifacts:
    when: always
    paths:
      - evals/regression-report.json
  only:
    - merge_requests
    - main

Setting quality thresholds

Define local gate thresholds in evalgate.config.json. The built-in local gate uses evals/baseline.json; platform gates use the quality and judge settings configured for the evaluation.

evalgate.config.json

{
  "gate": {
    "baseline": "evals/baseline.json",
    "report": "evals/regression-report.json"
  },
  "judge": {
    "tprMin": 0.7,
    "tnrMin": 0.7,
    "minLabeledSamples": 30
  }
}

Store evalgate.config.json and evals/baseline.json in version control. Changing thresholds or baselines without review changes what your CI gate proves.

Exit codes

Code	Meaning
`0`	Clean: no regressions detected
`1`	Regression: tests failed, specs failed, or score dropped below the selected gate
`2`	Configuration or infrastructure issue
`8`	Warning state for weak judge discriminative power in supported platform gate paths

CLI commands reference

npx @evalgate/sdk init
npx @evalgate/sdk verify
npx @evalgate/sdk doctor

Best practices

Start local

Use the local gate first so every repo gets a baseline before the team adopts platform workflows.

Upload artifacts

Keep .evalgate/ or evals/regression-report.json as CI artifacts so failures are debuggable after the job exits.

Use impacted-only deliberately

Add --impacted-only only after your spec manifest is stable. Unknown changes fall back to broader runs.

Promote reviewed cases

Treat generated candidate cases as drafts until a human or promotion workflow approves them for gating.

​Set up CI/CD regression gates with EvalGate

​One-command eval artifact CI

​Local gate setup

​Platform gate

​GitLab CI configuration

​Setting quality thresholds

​Exit codes

​CLI commands reference

​Best practices

Start local

Upload artifacts

Use impacted-only deliberately

Promote reviewed cases

Set up CI/CD regression gates with EvalGate

One-command eval artifact CI

Local gate setup

Platform gate

GitLab CI configuration

Setting quality thresholds

Exit codes

CLI commands reference

Best practices