Skip to main content
LLM Evaluators require an LLM in order to score an evaluation input. Phoenix evals are provider agnostic and work with virtually any foundation model.

Python Configuration

The Phoenix evals Python package uses an adapter pattern to wrap underlying client SDKs and provide a unified interface. Each adapter forwards parameters directly to the underlying client, so you can use the same configuration options as the native SDK.
  • Client configuration parameters (e.g., api_key, base_url, api_version) are passed as **kwargs when creating the LLM instance. These configure the client itself.
  • Model invocation parameters (e.g., temperature, max_tokens, top_p) are passed as **kwargs when creating an evaluator. These control how the model generates responses.
Detailed information and examples for each adapter can be found in the sections below. When creating an LLM, specify:
  • provider: The provider name (e.g., "openai", "azure", "anthropic")
  • model: The model identifier
  • client (optional): Which client SDK to use if multiple are installed (e.g., "openai", "langchain", "litellm")
  • sync_client_kwargs (optional): Client configuration forwarded only to the sync client
  • async_client_kwargs (optional): Client configuration forwarded only to the async client
  • **kwargs: Client configuration parameters forwarded to both sync and async client constructors.
To see the currently supported LLM providers and their availability, use the show_provider_availability function:
from phoenix.evals.llm import show_provider_availability

show_provider_availability()
The output shows which providers are available based on installed dependencies, and which client SDKs can be used for each provider:
📦 AVAILABLE PROVIDERS (sorted by client priority)
--------------------------------------------------------------------
Provider  | Status      | Client       | Dependencies                  
--------------------------------------------------------------------
azure     | ✓ Available | openai       | openai               
openai    | ✓ Available | openai       | openai               
openai    | ✓ Available | langchain    | langchain, langchain-openai
openai    | ✓ Available | litellm      | litellm              
anthropic | ✓ Available | anthropic    | anthropic            
anthropic | ✓ Available | langchain    | langchain, langchain-anthropic
anthropic | ✓ Available | litellm      | litellm              
google    | ✓ Available | google-genai | google-genai         
litellm   | ✓ Available | litellm      | litellm              
bedrock   | ✓ Available | litellm      | litellm, boto3       
vertex    | ✓ Available | litellm      | litellm              
The provider column shows the supported providers, and the status column will read “Available” if the required dependencies are installed in the active Python environment. Note that multiple client SDKs can be used to make LLM requests to a provider; the desired client SDK can be specified when constructing the LLM wrapper client.

OpenAI Adapter

Client: openai.OpenAI() or openai.AsyncOpenAI()
Invocation: client.chat.completions.create()
Docs: OpenAI Python Client
from phoenix.evals.llm import LLM
from phoenix.evals import ClassificationEvaluator

# Client config → LLM creation
llm = LLM(
    provider="openai",
    model="gpt-4o",
    client="openai",
    api_key="your-api-key",  # Client config param
    timeout=30.0,  # Client config param
)

# Invocation params → Evaluator creation
evaluator = ClassificationEvaluator(
    name="example",
    prompt_template="Classify: {input}",
    choices={"positive": 1, "negative": 0},
    llm=llm,
    temperature=0.0,  # Invocation param
    max_tokens=100,  # Invocation param
)

Azure OpenAI Adapter

Client: openai.AzureOpenAI() or openai.AsyncAzureOpenAI()
Invocation: client.chat.completions.create()
Docs: Azure OpenAI Python SDK
Note: The model parameter should be your Azure deployment name.
from phoenix.evals.llm import LLM
from phoenix.evals import ClassificationEvaluator

llm = LLM(
    provider="azure",
    model="gpt-4o-deployment",  # Azure deployment name
    api_key="your-azure-api-key",
    api_version="2024-02-15-preview",
    azure_endpoint="https://your-resource.openai.azure.com",
)

evaluator = ClassificationEvaluator(
    name="example",
    prompt_template="Classify: {input}",
    choices={"positive": 1, "negative": 0},
    llm=llm,
    temperature=0.0,
    max_tokens=100,
)

LiteLLM Adapter

Client: Lightweight wrapper (no traditional client object)
Invocation: litellm.completion() or litellm.acompletion()
Docs: LiteLLM Documentation
Note: Model names must use provider route format: {provider}/{model} (e.g., "x-ai/grok-2").
from phoenix.evals.llm import LLM
from phoenix.evals import ClassificationEvaluator
import os

os.environ["XAI_API_KEY"] = "your-xai-api-key"

llm = LLM(
    provider="litellm",
    model="x-ai/grok-2",  # Provider route format
    client="litellm",
)

evaluator = ClassificationEvaluator(
    name="example",
    prompt_template="Classify: {input}",
    choices={"positive": 1, "negative": 0},
    llm=llm,
    temperature=0.0,
    max_tokens=100,
)

LangChain Adapter

Client: LangChain chat model classes (e.g., langchain_openai.ChatOpenAI, langchain_anthropic.ChatAnthropic)
Invocation: client.invoke() or client.predict()
Docs: LangChain OpenAI, LangChain Anthropic
from phoenix.evals.llm import LLM
from phoenix.evals import ClassificationEvaluator

llm = LLM(
    provider="openai",
    model="gpt-4o",
    client="langchain",
    api_key="your-api-key",
)

evaluator = ClassificationEvaluator(
    name="example",
    prompt_template="Classify: {input}",
    choices={"positive": 1, "negative": 0},
    llm=llm,
    temperature=0.0,
    max_tokens=100,
)

Anthropic Adapter

Client: anthropic.Anthropic() or anthropic.AsyncAnthropic()
Invocation: client.messages.create()
Docs: Anthropic Python SDK
Note: max_tokens is required and defaults to 4096 if not specified when creating the evaluator.
from phoenix.evals.llm import LLM
from phoenix.evals import ClassificationEvaluator

llm = LLM(
    provider="anthropic",
    model="claude-3-5-sonnet-20241022",
    api_key="your-anthropic-api-key",
    timeout=30.0,
)

evaluator = ClassificationEvaluator(
    name="example",
    prompt_template="Classify: {input}",
    choices={"positive": 1, "negative": 0},
    llm=llm,
    temperature=0.0,
    max_tokens=1024,
)

Google GenAI Adapter

Client: google.genai.Client()
Invocation: client.models.generate_content()
Docs: Google GenAI Python SDK
from phoenix.evals.llm import LLM
from phoenix.evals import ClassificationEvaluator

llm = LLM(
    provider="google",
    model="gemini-2.0-flash-exp",
    api_key="your-google-api-key",  # or set env var
)

evaluator = ClassificationEvaluator(
    name="example",
    prompt_template="Classify: {input}",
    choices={"positive": 1, "negative": 0},
    llm=llm,
    temperature=0.0,
)

Separate Sync/Async Client Configuration

Some providers (OpenAI, Anthropic) create separate sync and async SDK clients internally. The sync_client_kwargs and async_client_kwargs parameters allow passing configuration that applies only to one client type, useful for:
  • Different timeouts: Longer timeouts for async batch operations
  • Different HTTP clients: Custom httpx clients for sync vs async
  • Different retry configurations: More aggressive retries for batch async calls
Example: Different Timeouts for Sync and Async Clients
from phoenix.evals.llm import LLM

llm = LLM(
    provider="openai",
    model="gpt-4o",
    api_key="your-api-key",
    sync_client_kwargs={"timeout": 30.0},
    async_client_kwargs={"timeout": 120.0},
)
Example: Custom HTTP Clients
import httpx
from phoenix.evals.llm import LLM

llm = LLM(
    provider="openai",
    model="gpt-4o",
    api_key="your-api-key",
    sync_client_kwargs={"http_client": httpx.Client(timeout=30.0)},
    async_client_kwargs={"http_client": httpx.AsyncClient(timeout=120.0)},
)

TypeScript Configuration

The TypeScript evaluation library uses the AI SDK’s LanguageModel type for model abstraction. Models are created using AI SDK provider functions and passed directly to evaluators.

Installation

# Install model provider(s) separately based on your needs
npm install @ai-sdk/openai      # For OpenAI models
npm install @ai-sdk/anthropic   # For Anthropic models  
npm install @ai-sdk/google      # For Google models
npm install @ai-sdk/azure       # For Azure OpenAI models

Configuring Model Providers

Import and configure your model provider, then pass it to evaluators:
import { openai } from "@ai-sdk/openai";
import { anthropic } from "@ai-sdk/anthropic";

// OpenAI model
const openaiModel = openai("gpt-4o-mini");

// Anthropic model
const anthropicModel = anthropic("claude-sonnet-4-20250514");
The AI SDK handles authentication via environment variables (e.g., OPENAI_API_KEY, ANTHROPIC_API_KEY) or you can pass configuration directly:
import { createOpenAI } from "@ai-sdk/openai";
import { createAzure } from "@ai-sdk/azure";

// OpenAI with custom configuration
const openai = createOpenAI({
  apiKey: "my-openai-api-key",
  baseURL: "https://custom-endpoint.com/v1",
});
const model = openai("gpt-4o-mini");

// Azure OpenAI
const azure = createAzure({
  apiKey: "your-azure-api-key",
  resourceName: "your-resource-name",
});
const azureModel = azure("your-deployment-name");

Using with LLM Evaluators

import { createClassificationEvaluator } from "@arizeai/phoenix-evals/llm";
import { openai } from "@ai-sdk/openai";

const model = openai("gpt-4o-mini");

// Create a classification evaluator
const evaluator = createClassificationEvaluator({
  name: "factual_check",
  model,
  choices: { factual: 1, hallucinated: 0 },
  promptTemplate: "Your evaluation prompt here: {input}",
});

Invocation Parameters

Model invocation parameters (like temperature, maxTokens, etc.) are passed through to the underlying AI SDK generateObject call. However, the current TypeScript type definitions don’t explicitly include these parameters in CreateClassifierArgs or CreateClassificationEvaluatorArgs, so TypeScript will show type errors if you try to pass them directly. Note: Invocation parameters work at runtime (they are captured via the ...rest spread in createClassifierFn and passed through to generateObject), but TypeScript will show type errors at compile time. To use invocation parameters, you’ll need to use type assertions (as shown in the example below) since the AI SDK does not support setting default invocation parameters at the model level.
const evaluator = createClassificationEvaluator({
  name: "factual_check",
  model,
  choices: { factual: 1, hallucinated: 0 },
  promptTemplate: "Your evaluation prompt here: {input}",
  temperature: 0.0,
  maxTokens: 100,
} as any);
For more configuration options and provider-specific settings, refer to the AI SDK documentation.