Server Evals

How It Works
Evaluator Types
Why Use Server Evals
Getting Started

Dataset evaluators let you attach an evaluation suite directly to a dataset. Evaluators run server-side whenever you execute experiments via the Playground — define them once on the dataset and they run every time, with no local code or reconfiguration required.

How It Works

Attach evaluators to a dataset — Open a dataset, navigate to the Evaluators tab, and add LLM-based or built-in code evaluators. Configure input mappings once to tell each evaluator where to find its inputs.
Run an experiment — Execute an experiment against that dataset from the Playground. Attached evaluators run server-side automatically.
Review scores and traces — Results appear as annotations on the experiment run. Every evaluator execution is traced in its own project so you can navigate from a score to the exact LLM call that produced it.

Evaluator Types

Built-in Code Evaluators

Deterministic evaluators that run without an LLM — Contains, Exact Match, Regex, Levenshtein Distance, and JSON Distance.

LLM Evaluators

LLM-as-a-judge evaluators backed by Phoenix-managed prompts. Use pre-built templates for common tasks like correctness and tool response handling, or write your own.

Why Use Server Evals

Attach once, evaluate everywhere — Evaluators are defined on the dataset, not the experiment. Every Playground run against that dataset automatically records scores.
No local setup required — Built-in evaluators run entirely server-side. LLM evaluators use the model configuration already set up on your Phoenix instance — no SDK, API keys, or local dependencies needed.
Flexible input mapping — Map evaluator variables to any dataset field — input, output, reference, or metadata — using JSON paths for nested values.
Full traceability — Every evaluator execution is traced in its own project. Navigate from an annotation score to the exact LLM call that produced it, making it easy to debug and refine evaluation criteria.

Getting Started

Open a dataset, navigate to the Evaluators tab, click Add evaluator, configure your input mapping, and run an experiment from the Playground. Scores and traces appear automatically.

Agent Function Calling Eval LLM Evaluators

⌘I

Get Started

Tracing

Evaluation

Datasets & Experiments

Prompts

Settings

Concepts

Resources

How It Works

Evaluator Types

Built-in Code Evaluators

LLM Evaluators

Why Use Server Evals

Getting Started

Get Started

Tracing

Evaluation

Datasets & Experiments

Prompts

Settings

Concepts

Resources

​How It Works

​Evaluator Types

Built-in Code Evaluators

LLM Evaluators

​Why Use Server Evals

​Getting Started

How It Works

Evaluator Types

Why Use Server Evals

Getting Started