What You’ll Build
A typescript customer support agent that handles two types of queries:- Order status questions → Calls a tool to look up order information
- FAQ questions → Searches a knowledge base using RAG
Follow along with Complete Code Walkthroughs
TypeScript Tutorial
Companion TypeScript project with runnable examples
Python Tutorial
Companion Python project with runnable examples
Chapter 1: Your First Traces
The problem: Your agent is a black box. When something goes wrong, you addconsole.log statements, re-run, and hope you logged the right thing.
What you’ll learn:
- Instrument your agent with OpenTelemetry in 5 minutes
- Trace LLM calls, tool executions, and RAG retrievals automatically
- Group related operations under parent spans for complete request context
- Navigate the Phoenix UI to explore traces
Chapter 2: Annotations and Evaluation
The problem: You can see what’s happening, but you can’t tell if responses are actually good. A trace showing “200 OK” doesn’t mean the answer was right. What you’ll learn:- Annotate traces with human feedback directly in the Phoenix UI
- Capture user reactions (thumbs up/down) from your application and attach them to traces
- Build LLM-as-Judge evaluators that automatically assess quality
- Find patterns in what’s failing across hundreds of traces
Chapter 3: Sessions
The problem: Your agent handles single queries fine, but real users have conversations. “What’s my order status?” → “When will it arrive?” → “Can I change the address?” Without sessions, each query is isolated - you can’t see if the agent remembered the order ID from the first turn. What you’ll learn:- Add session tracking to group conversation turns together
- View conversations as chat-like threads in Phoenix
- Evaluate entire conversations for coherence and resolution
- Debug “the bot forgot what I said” issues by seeing exactly where context was lost
Prerequisites
- Access to Phoenix Cloud or Phoenix running locally (
pip install arize-phoenix && phoenix serve) - OpenAI API key for LLM calls

