Annotations API — human labeling and review
Create annotation tasks, assign traces for human review, and submit labels to build the golden dataset used for measuring LLM judge credibility.
Human annotations are the foundation of judge credibility in EvalGate. When you label a set of traces as pass or fail, those labels become the ground truth that the LLM Judge alignment endpoint compares against automated judge scores. A judge with high alignment against a well-labeled dataset is one you can trust to gate your CI pipeline.
GET /api/annotations/tasks — list annotation tasks
Returns annotation tasks for the authenticated organization.
curl https://evalgate.com/api/annotations/tasks \
-H "Authorization: Bearer YOUR_API_KEY"
Response
{
"tasks" : [
{
"id" : 12 ,
"name" : "Support quality review — March" ,
"status" : "in_progress" ,
"itemCount" : 120 ,
"completedCount" : 87 ,
"organizationId" : "00000000-0000-4000-8000-000000000001" ,
"createdAt" : "2026-03-01T09:00:00.000Z"
}
]
}
Display name for the task.
Task status: draft, in_progress, or completed.
Total number of items (traces) assigned to this task.
Number of items that have received a label.
Owning organization UUID.
ISO 8601 creation timestamp.
POST /api/annotations/tasks — create an annotation task
Creates a new task and assigns a set of traces for labeling.
curl https://evalgate.com/api/annotations/tasks \
-X POST \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"name": "Support quality review — March",
"traceIds": [42, 43, 44, 45]
}'
Request body
Display name for this annotation task.
Array of numeric trace IDs to include in this task. Each trace will become one annotation item.
Optional guidance text shown to annotators when they open the task.
Optional array of label strings annotators can choose from. Defaults to ["pass", "fail"] when not specified.
Response (201)
{
"id" : 13 ,
"name" : "Support quality review — March" ,
"status" : "draft" ,
"itemCount" : 4 ,
"completedCount" : 0 ,
"organizationId" : "00000000-0000-4000-8000-000000000001" ,
"createdAt" : "2026-03-15T11:30:00.000Z"
}
GET /api/annotations/tasks/ — get task details
Returns a single annotation task with its items.
curl https://evalgate.com/api/annotations/tasks/12 \
-H "Authorization: Bearer YOUR_API_KEY"
Path parameters
Numeric ID of the annotation task.
Response
{
"id" : 12 ,
"name" : "Support quality review — March" ,
"status" : "in_progress" ,
"items" : [
{
"id" : 201 ,
"traceId" : 42 ,
"label" : "pass" ,
"notes" : "Clear and complete response" ,
"labeledAt" : "2026-03-10T14:22:00.000Z" ,
"labeledBy" : "user@example.com"
},
{
"id" : 202 ,
"traceId" : 43 ,
"label" : null ,
"notes" : null ,
"labeledAt" : null ,
"labeledBy" : null
}
]
}
ID of the trace this item references.
The label assigned by the annotator. null if not yet labeled.
Optional free-text notes from the annotator.
ISO 8601 timestamp when the label was submitted. null if not yet labeled.
Email or identifier of the annotator who submitted the label.
POST /api/annotations/tasks//items — submit an annotation
Submits a label for a single annotation item within a task.
curl https://evalgate.com/api/annotations/tasks/12/items \
-X POST \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"itemId": 202,
"label": "fail",
"notes": "Response did not address the user' \' 's core question"
}'
Path parameters
Numeric ID of the annotation task.
Request body
Numeric ID of the annotation item to label.
The label to assign. Must be one of the task’s configured labelOptions, or pass / fail by default.
Optional free-text notes explaining the label decision. These are stored alongside the label for audit and inter-rater review.
Response
{
"id" : 202 ,
"traceId" : 43 ,
"label" : "fail" ,
"notes" : "Response did not address the user's core question" ,
"labeledAt" : "2026-03-15T12:05:00.000Z" ,
"labeledBy" : "user@example.com"
}
Once a task has enough labels, run the LLM Judge alignment check to measure how well your automated judge agrees with your team’s ground truth. A high-alignment judge is safe to use as an automated CI gate.