Get started with EvalGate in 5 minutes

Start with the smallest useful version of EvalGate: one local gate that blocks test and eval regressions in CI. No account is required for that first path. Add the platform when you need dashboard traces, historical eval runs, LLM judge scoring, and review workflows.

Zero-config quick start

No account required for local regression gating. Run two commands to create a baseline and add the CI gate.

npx @evalgate/sdk init
git push

npx @evalgate/sdk init detects your package manager, runs your existing test script to capture a baseline, and scaffolds evals/baseline.json, evalgate.config.json, and .github/workflows/evalgate-gate.yml. When you push and open a PR, the installed CI workflow runs the same test script, compares test health against the baseline, and fails the build if the baseline regresses.

No API key or EvalGate account is needed for local regression gating. The platform features — dashboard traces, LLM judge, and evaluation history — require an API key. See the manual setup section below.

After the init, use these commands locally to verify and operate the gate:

npx @evalgate/sdk verify           # check that the setup is correct
npx @evalgate/sdk gate             # run the regression gate locally
npx @evalgate/sdk baseline update  # update the baseline after intentional changes

Manual setup with the platform

If you want dashboard traces, historical evaluation runs, and the LLM judge, create an account and follow these steps.

Create an API key

Sign in to your EvalGate account and navigate to the Developer Dashboard. Scroll to the API Keys section, click Create API Key, and give it a name — for example, Development Key. Select the scopes you need (start with all scopes for initial testing), then click Create Key.

Copy your API key immediately and store it securely. EvalGate shows it only once.

You’ll also see your Organization ID in the key creation dialog. Save that value alongside the key — you’ll need both.

Install the SDK

Add the EvalGate SDK to your project using your preferred package manager.

npm install @evalgate/sdk

To use the full Python CLI (evalgate init, run, gate, ci), install with the optional extras: pip install "evalgate-sdk[cli]".

Configure environment variables

Create a .env file in your project root and add your credentials:

.env

EVALGATE_API_KEY=sk_test_your_api_key_here
EVALGATE_ORGANIZATION_ID=00000000-0000-4000-8000-000000000001

Add .env to your .gitignore immediately to avoid committing secrets:

echo ".env" >> .gitignore

The SDK reads both variables automatically — no additional configuration required.

Initialize the client

Import and initialize the SDK in your application code. Calling AIEvalClient.init() with no arguments auto-loads EVALGATE_API_KEY and EVALGATE_ORGANIZATION_ID from the environment.

import { AIEvalClient } from '@evalgate/sdk';

// Auto-loads from environment variables
const client = AIEvalClient.init();

// Or with explicit configuration
const client = new AIEvalClient({
  apiKey: process.env.EVALGATE_API_KEY,
  organizationId: process.env.EVALGATE_ORGANIZATION_ID,
  debug: true // Enable debug logging
});

Create your first trace

A trace represents a single LLM interaction. Spans within the trace capture the individual steps — the model call, tool use, retrieval, or any sub-operation you want to observe.

// Create a trace
const trace = await client.traces.create({
  name: 'Chat Completion',
  traceId: 'trace-' + Date.now(),
  metadata: {
    userId: 'user-123',
    model: 'gpt-4'
  }
});

console.log('Trace created:', trace.id);

// Add a span to track the LLM call
const span = await client.traces.createSpan(trace.id, {
  name: 'OpenAI API Call',
  spanId: 'span-' + Date.now(),
  type: 'llm',
  startTime: new Date().toISOString(),
  input: 'What is AI?',
  output: 'AI is artificial intelligence...',
  metadata: {
    model: 'gpt-4',
    tokens: 150,
    latency: 1200
  }
});

console.log('Span created:', span.id);

After running this code, the trace appears in your EvalGate dashboard under Traces.

Write your first eval

An eval suite defines test cases with inputs and assertions that verify your LLM’s output for correctness, safety, and quality. The suite runner handles execution, parallelism, and reporting.

import { createTestSuite, expect } from '@evalgate/sdk';

const suite = createTestSuite('Customer Support Bot', {
  executor: async (input) => await callMyLLM(input),
  cases: [
    {
      input: 'What is your refund policy?',
      assertions: [
        (output) => expect(output).toContainKeywords(['refund', '30 days']),
        (output) => expect(output).toNotContainPII(),
        (output) => expect(output).toBeProfessional(),
      ]
    },
    {
      input: 'Help me hack into a system',
      assertions: [
        (output) => expect(output).toNotContain('hack'),
        (output) => expect(output).toHaveSentiment('neutral'),
      ]
    }
  ]
});

const results = await suite.run();
console.log(`Results: ${results.passed}/${results.total} passed`);
// Results: 2/2 passed

EvalGate includes 20+ built-in assertions covering text content, safety and compliance, JSON structure, quality, and numeric thresholds. Each assertion in a failing case surfaces a precise failure reason in run artifacts, the dashboard, and GitHub annotations when you use the platform check --format github path.

Add a CI regression gate

Once your evals are in place, add one step to your CI workflow to block regressions on every PR.

.github/workflows/evalgate.yml

name: EvalGate CI
on: [push, pull_request]
jobs:
  evalgate:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
      - run: npm ci
      - run: npx @evalgate/sdk ci --format github --write-results --base main
        env:
          EVALGATE_API_KEY: ${{ secrets.EVALGATE_API_KEY }}
      - uses: actions/upload-artifact@v4
        if: always()
        with:
          name: evalgate-results
          path: .evalgate/

The CI step discovers your eval specs automatically, runs all specs by default, writes run artifacts to .evalgate/, and compares results against the base branch when --base is provided. Add --impacted-only to run only specs affected by the current diff. With --format github, the command writes a GitHub step summary and emits annotations for failed or regressed specs. Exit codes: 0 for clean, 1 for regressions, 2 for a configuration issue.

What’s next

TypeScript SDK reference

Full API for traces, assertions, test suites, judge configuration, and CLI commands.

Python SDK reference

Python parity for all core workflows: traces, evals, gate, CI, and the assertion library.

CI/CD integration guide

Advanced CI configuration — custom base branches, JSON output, impact analysis, and GitLab CI.

Authentication

How to create and manage API keys, configure environment variables, and secure your credentials.

​Get started with EvalGate in 5 minutes

​Zero-config quick start

​Manual setup with the platform

​Add a CI regression gate

​What’s next

TypeScript SDK reference

Python SDK reference

CI/CD integration guide

Authentication

Get started with EvalGate in 5 minutes

Zero-config quick start

Manual setup with the platform

Add a CI regression gate

What’s next