TypeScript SDK
YAML remains AgentV’s canonical, portable eval format. The SDK surfaces below are for cases where you want to generate YAML-shaped definitions in code, embed eval runs inside another application, or write executable graders and prompt templates. For authoring helpers, @agentv/sdk is AgentV’s public lightweight SDK package.
AgentV currently provides two npm packages for programmatic use:
@agentv/sdk— user-facing SDK forevaluate(), YAML-aligned eval authoring, custom assertions, and script graders@agentv/core— core implementation package and typed configuration
Installation
Section titled “Installation”# User-facing SDK (evaluate, defineEval, graders, defineAssertion, defineScriptGrader)npm install @agentv/sdk
# Core configuration helpers (defineConfig)npm install @agentv/coreMigrating from @agentv/eval
Section titled “Migrating from @agentv/eval”Use @agentv/sdk for all new TypeScript SDK code:
npm uninstall @agentv/evalnpm install @agentv/sdkimport { defineScriptGrader } from '@agentv/sdk';The general policy is hard convergence for same-week or unreleased surface names: use the correct package, field, or wire name instead of carrying aliases. @agentv/eval was already published, then deprecated on npm, and has been removed from this repository. New docs, examples, scaffolds, and skills should use @agentv/sdk directly.
Choose a Surface
Section titled “Choose a Surface”Use the simplest surface that matches the job:
- YAML / JSONL first for portable eval specs you want to run from the CLI, check into a repo, or share across TypeScript and Python workflows.
defineEval()/evalSuite()when you want a.eval.tsfile that mirrors YAML concepts and lowers back to the canonical snake_case contract.evaluate({ specFile })when you want library control around an existing YAML suite.- Inline
evaluate({ tests })when the eval definition truly belongs inside application code. The programmatic API mirrors YAML, but uses current TypeScript naming such asexpectedOutput. defineAssertionwhen you want a reusable assertion type discovered from.agentv/assertions/.defineScriptGraderwhen you need a command-backed grader with explicit score and assertion-result control.agentv eval <verifier.test.ts>for deterministic workspace checks that fit normal Vitestexpect(...)tests.
There is no separate first-party Python authoring SDK today. Python-facing workflows should either emit canonical YAML/JSONL or implement executable graders that consume the standard snake_case wire format.
For example, the repo-local helper in examples/features/sdk-python/ can build YAML-shaped cases while keeping assert as the durable authored contract:
from agentv_py.evals import EvalDefinition, JsonlCase, write_eval_yaml, write_jsonl
def rag_faithfulness(): return { "name": "rag-faithfulness", "type": "llm-rubric", "target": "grader-target", "prompt": "Grade whether the answer is supported by the retrieved context.", }
write_jsonl( "evals/cases.jsonl", [ JsonlCase( id="grounded-answer", input=[{"role": "user", "content": "Answer using the retrieved context."}], expected_output=[{"role": "assistant", "content": "The answer cites the source material."}], extra={"assert": [rag_faithfulness()]}, ) ],)
write_eval_yaml( "evals/suite.yaml", EvalDefinition(name="rag-suite", tests="./cases.jsonl"),)This is example-local/repo-local guidance, not a promise of a published Python package.
YAML-Aligned .eval.ts Authoring
Section titled “YAML-Aligned .eval.ts Authoring”Use defineEval() from @agentv/sdk when you want TypeScript ergonomics without creating a second eval vocabulary. The helper keeps authoring in camelCase where TypeScript needs it, then lowers back to the canonical snake_case eval object contract when AgentV loads the file.
import { defineEval, graders } from '@agentv/sdk';
export default defineEval({ name: 'hello-suite', target: 'mock-sdk', workspace: { hooks: { beforeAll: { command: ['echo', 'suite-start'], }, }, }, tests: [ { id: 'hello', input: 'Say hello', inputFiles: ['../fixtures/per-test-note.md'], expectedOutput: 'Hello from the mock target', assert: [graders.contains('Hello')], }, ],});Useful companion helpers:
toEvalYamlObject()returns the canonical snake_case object.serializeEvalYaml()returns YAML text using the same canonical field names.
The durable authored field remains assert. This helper does not introduce a second YAML vocabulary.
Built-In Grader Helpers
Section titled “Built-In Grader Helpers”@agentv/sdk includes a small graders catalog for common deterministic and LLM-backed grader configs. These helpers return ordinary assert entries and serialize to the same canonical YAML you could write by hand.
import { defineEval, graders } from '@agentv/sdk';
export default defineEval({ name: 'grader-helper-suite', tests: [ { id: 'json-greeting', input: 'Return a JSON greeting.', assert: [ graders.contains('Hello', { name: 'mentions-hello' }), graders.exact('{"message":"Hello"}', { name: 'exact-json', minScore: 1 }), graders.regex(/"message"\s*:/, { name: 'message-key' }), graders.json({ name: 'valid-json', required: true }), graders.llmRubric(['Greets the user'], { name: 'rubric-review' }), graders.llmRubric(undefined, { name: 'llm-review', prompt: 'Grade whether the answer is useful.', target: 'grader-target', }), graders.scriptGrader(['bun', 'run', 'graders/check.ts'], { name: 'scripted-check' }), ], }, ],});The catalog covers contains, equals/exact, regex, is-json/json, llm-rubric, and script. CamelCase SDK options such as minScore, maxSteps, and rubric scoreRanges lower to min_score, max_steps, and score_ranges when AgentV loads or serializes the suite.
AgentV-Native Helper Factories
Section titled “AgentV-Native Helper Factories”If you are coming from Braintrust scores or DeepEval metrics, keep the reusable logic AgentV-native: write helper factories that return graders.* configs, then let defineEval() lower them to ordinary assert entries.
import { defineEval, graders } from '@agentv/sdk';
function ragFaithfulness() { return graders.llmRubric(undefined, { name: 'rag-faithfulness', target: 'grader-target', prompt: [ 'Grade whether the answer is supported by the retrieved context.', 'Use the input and expected_output fields as grounding evidence.', ].join('\n'), });}
export default defineEval({ name: 'rag-suite', tests: [ { id: 'grounded-answer', input: 'Answer the question using the retrieved context.', expectedOutput: 'The answer cites the source material.', assert: [ graders.contains('source', { name: 'mentions-source' }), ragFaithfulness(), ], }, ],});The helper above serializes to the same shape you could write by hand:
assert: - name: mentions-source type: contains value: source - name: rag-faithfulness type: llm-rubric target: grader-target prompt: |- Grade whether the answer is supported by the retrieved context. Use the input and expected_output fields as grounding evidence.Custom Assertions
Section titled “Custom Assertions”Use defineAssertion from @agentv/sdk to create reusable assertion types. Place them in .agentv/assertions/ — they’re auto-discovered by filename.
This is the custom assertion path, not the custom grader path. It matches Promptfoo’s assertion terminology for normal eval checks, while extending Promptfoo’s fixed custom logic types (javascript, python, ruby, webhook) with arbitrary discovered AgentV type names.
Pass/Fail Pattern
Section titled “Pass/Fail Pattern”import { defineAssertion } from '@agentv/sdk';
export default defineAssertion(({ output }) => { const wordCount = (output ?? '').trim().split(/\s+/).filter(Boolean).length; const pass = wordCount >= 3; return { pass, score: pass ? 1 : 0, reason: `Output has ${wordCount} words`, };});Score Pattern
Section titled “Score Pattern”Return a score (0–1) instead of pass for graded evaluation:
import { defineAssertion } from '@agentv/sdk';
export default defineAssertion(({ output, traceSummary }) => { const hasContent = (output ?? '').length > 0 ? 0.5 : 0; const isEfficient = (traceSummary?.eventCount ?? 0) <= 10 ? 0.5 : 0; return { score: hasContent + isEfficient, reason: 'Checks content exists and is efficient', };});If only pass is given, score is 1 (pass) or 0 (fail).
Using in YAML
Section titled “Using in YAML”Convention-based discovery maps filename → assertion type:
.agentv/assertions/min-words.ts → type: min-words.agentv/assertions/sentiment.ts → type: sentimentReference directly in your eval file — no command: needed:
assert: - type: min-words - type: contains value: "Hello"Script Graders
Section titled “Script Graders”Use defineScriptGrader from @agentv/sdk for full control over scoring with optional per-check results:
import { defineScriptGrader } from '@agentv/sdk';
export default defineScriptGrader(({ output, traceSummary }) => ({ pass: (output ?? '').length > 0 && (traceSummary?.eventCount ?? 0) <= 5, score: (output ?? '').length > 0 && (traceSummary?.eventCount ?? 0) <= 5 ? 1.0 : 0.5, reason: 'Checks answer text and tool usage', checks: [ { text: 'Answer is not empty', pass: (output ?? '').length > 0, reason: (output ?? '').length > 0 ? 'Output is non-empty' : 'Output is empty', }, { text: 'Efficient tool usage', pass: (traceSummary?.eventCount ?? 0) <= 5, reason: 'Trace event count is within limit', }, ],}));For deterministic workspace verifiers, prefer normal Vitest tests plus AgentV’s built-in Vitest adapter command:
import { readFileSync } from 'node:fs';import { expect, it } from 'vitest';
it('links to the dashboard', () => { const page = readFileSync('app/page.tsx', 'utf8'); expect(page).toMatch(/href=["']\/dashboard["']/);});assert: - name: vitest-welcome-banner type: script command: [agentv, eval, graders/welcome-banner.test.ts]Use defineWorkspaceGrader only for tiny one-off file checks or custom score shaping:
import { defineWorkspaceGrader } from '@agentv/sdk';
export default defineWorkspaceGrader(async ({ workspace }) => [ await workspace.file('app/page.tsx').contains('Status: All systems ready'), await workspace.file('app/page.tsx').contains('Open dashboard'), await workspace.file('app/page.tsx').matches(/href=["']\/dashboard["']/), await workspace.file('app/page.tsx').notMatches(/TODO/i),]);defineScriptGrader, defineVitestWorkspaceGrader, and defineWorkspaceGrader custom scripts are graders referenced in YAML with type: script and command: [bun, run, grader.ts]. Plain Vitest verifier files can use command: [agentv, eval, graders/check.test.ts] without a custom wrapper; use agentv eval vitest when you need adapter flags. defineAssertion uses convention-based discovery instead — just place it in .agentv/assertions/ and reference it by assertion type name.
For detailed patterns, input/output contracts, and language-agnostic examples, see Script Graders.
Wire Format vs SDK Format
Section titled “Wire Format vs SDK Format”Raw grader stdin uses snake_case because it crosses a process boundary and may be consumed by Python, shell, jq, or external dashboards. The @agentv/sdk package converts that payload to idiomatic TypeScript camelCase before calling your handler.
| Raw stdin | SDK handler field |
|---|---|
expected_output | expectedOutput |
output_path | outputPath |
trace_summary | traceSummary |
token_usage | tokenUsage |
cost_usd | costUsd |
duration_ms | durationMs |
workspace_path | workspacePath |
output is already the final answer string in both formats. Transcript-aware code should read messages, trace.messages, or trace.events; answer-text graders should read output.
Programmatic API
Section titled “Programmatic API”Use evaluate() from @agentv/sdk to run evaluations as a library. The implementation is owned by @agentv/core, but the SDK re-exports it as the user-facing entrypoint. The most portable pattern is still to keep the suite in YAML and point specFile at it; inline tests are best when the eval is tightly coupled to application code.
Inline Test Definitions
Section titled “Inline Test Definitions”import { evaluate } from '@agentv/sdk';
const { results, summary } = await evaluate({ tests: [ { id: 'greeting', input: 'Say hello', expectedOutput: 'Hello there!', assert: [{ type: 'contains', value: 'Hello' }], }, ],});
console.log(`${summary.passed}/${summary.total} passed`);A strict OR is easy with inline assertion handlers:
import { evaluate } from '@agentv/sdk';
const { summary } = await evaluate({ tests: [ { id: 'capital', input: 'What is the capital of France?', expectedOutput: 'Paris', assert: [ ({ output }) => ({ name: 'capital-or-phrase', score: ((output ?? '').includes('Paris') || /capital of france/i.test(output ?? '')) ? 1 : 0, }), ], }, ], task: async (input) => `Agent: ${input}`, threshold: 0.8,});
console.log(`${summary.passed}/${summary.total} passed`);Auto-discovers the default target from .agentv/targets.yaml and .env credentials.
File-Based via specFile
Section titled “File-Based via specFile”Point to an existing YAML eval instead of inlining tests:
import { evaluate } from '@agentv/sdk';
const { results, summary } = await evaluate({ specFile: './evals/my-eval.eval.yaml',});This is the recommended bridge when you want SDK control without creating a separate code-first eval surface.
Typed Configuration
Section titled “Typed Configuration”Create agentv.config.ts at your project root for type-safe, validated configuration using defineConfig() from @agentv/core:
import { defineConfig } from '@agentv/core';
export default defineConfig({ execution: { maxConcurrency: 5, maxRetries: 2, verbose: true, }, output: { dir: './results' }, limits: { maxCostUsd: 10.0 },});The config file is auto-discovered by the CLI from your project root and validated with Zod at startup.
Trace Correlation
Section titled “Trace Correlation”AgentV stores eval-native run bundles, transcript sidecars, metrics, grading
artifacts, and summaries. For OpenTelemetry/OpenInference traces, instrument the
system under test or provider wrapper to emit spans directly to the external
backend, then correlate AgentV cases to those traces with safe external_trace
IDs or UI URLs when available.
Scaffold Commands
Section titled “Scaffold Commands”Bootstrap new assertions and eval files from the CLI:
# Create a new assertion typeagentv create assertion <name> # → .agentv/assertions/<name>.ts
# Create a new eval with test casesagentv create eval <name> # → evals/<name>.eval.yaml + .cases.jsonl