Overview
The Correctness evaluator assesses whether an LLM’s response is factually accurate, complete, and logically consistent. It evaluates the quality of answers without requiring external context or reference responses.
This is an LLM evaluator: Phoenix runs a judge model against a managed prompt template on your behalf.
When to Use
Use the Correctness evaluator when you need to:
- Validate factual accuracy — Ensure responses contain accurate information
- Check answer completeness — Verify responses address all parts of the question
- Detect logical inconsistencies — Identify contradictions within responses
- Evaluate general knowledge responses — Assess answers that don’t rely on retrieved context
- Get a quick gut-check — Capture a wide range of potential problems quickly
For evaluating responses against retrieved documents, use the Faithfulness evaluator instead. Correctness is best suited for evaluating general knowledge.
Input Mapping
The template handles output formatting automatically — it pulls from your experiment’s output. You don’t need to configure anything for the output side.
The only field you may need to map is input, which should point to the user query from your dataset. For example, if your dataset has input.query:
| Template field | Dataset column |
|---|
input | input.query |
Output Labels
| Property | Value | Description |
|---|
label | "correct" or "incorrect" | Classification result |
score | 1.0 or 0.0 | Numeric score (1.0 = correct, 0.0 = incorrect) |
explanation | string | LLM-generated reasoning for the classification |
| Optimization | Maximize | Higher scores are better |
Criteria for Correct (1.0):
- The response is factually accurate
- The response fully addresses all parts of the question
- The response is logically consistent with no internal contradictions
Criteria for Incorrect (0.0):
- The response contains factual errors
- The response is incomplete or omits key parts of the answer
- The response contains logical inconsistencies or contradictions
Using in Phoenix
- Navigate to your dataset and open the Evaluators tab.
- Click Add Evaluator and select LLM Evaluator Template, then choose correctness.
- In the evaluator slide-over, you’ll see the prompt template and choices are pre-configured. You can use the defaults or edit the prompt to fit your use case.
- Set an input mapping for the
input field so the template pulls from the correct column in your dataset. Output formatting is already handled by the template — no output mapping needed.
- Optionally, configure which LLM to use as the judge model.
- Click Create. The evaluator will automatically run on any future experiments for that dataset.
See Also