Developer Tools

Test before
you commit.

A safe sandbox for experimenting with expert agents, comparing models side-by-side, validating guardrails, and previewing workflows — without touching production.

Request Access

Prompt Lab Guardrail Tester Tool Tester Dry Run

Prompt

PROMPT

Review the auth module and suggest improvements

→

Model

MODEL

claude-sonnet-4-6

active turn TURN 1/3

⇒

Tool Calls

file_read

auth/index.ts

↑ 1.2K ↓ 380 · 420ms

grep_search

TODO|FIXME

↑ 2.1K ↓ 740 · 680ms

→

Response

DONE

✓

2.4s

3 turns

4.3K tokens

AGENTIC LOOP TRACE

file_read ↑1.2K ↓380 · 420ms

grep_search ↑2.1K ↓740 · 680ms

(text) ↑3.4K ↓1.1K · 1.3s

① Prompt enters model — the expert agent receives its task and begins turn 1

② Tools called and returned — file_read and grep_search execute; results feed back into the model

③ Final response exits — after 3 turns the model produces its text answer with full token trace

Agent Chat

Interactive conversations with any expert agent. Select your model, choose an expert agent type, and test how it responds to real prompts. Full session history with auto-save.

Prompt Lab

Compare multiple models side-by-side. See word-level diffs between responses. Compare token usage and cost. Find the best model for each use case.

Guardrail Tester

Validate your guardrail rules against real file changes before deploying. See exactly which rules would trigger and why.

Workflow Dry Run

Preview what an expert agent would plan for a given job — without executing. Review the strategy before committing resources.

Experiment safely

Every conversation, comparison, and test is automatically saved. Pick up where you left off, review past experiments, or share sessions with your team. Auto-named from your first message for easy discovery.

A typical exploration

     Your question
          │
          ▼
   ┌───────────────┐      side-by-side
   │  Prompt Lab   │────────────────────► Model A diff vs. Model B
   └───────┬───────┘
           │  promote winner
           ▼
   ┌───────────────┐
   │  Agent Chat   │◄── pick expert + tools
   └───────┬───────┘
           │  dry-run
           ▼
   ┌───────────────┐
   │ Workflow plan │── (no tool execution, no cost)
   └───────┬───────┘
           ▼
    Guardrail tester → deploy rules or adjust

01

Compare models in Prompt Lab

Run the same prompt against Haiku, Sonnet, and a local model side-by-side. The diff view shows word-level disagreements; the cost column shows what you'd have paid per million tokens.
02

Hand the winner to Agent Chat

Take the prompt forward to a full agentic session with tools. Every tool call is logged; you can stop, rewind, or adjust guardrails between turns.
03

Dry-run a workflow, then a rule

Workflow Dry Run decomposes a real job without executing anything — you see the DAG before any cost is incurred. Guardrail Tester replays a proposed rule over past findings so you see what would have fired.

Related features

Expert Agents Multi-Model Support Benchmark Lab

Try the sandbox

Request access to experiment with expert agents in a safe environment.

Request Early Access

Test beforeyou commit.