LLM Request Explorer

A tool to visualize and inspect the details of an LLM API request. Streaming native.

Why?

LLM API performance is core to product experience. High-level numbers can hide important details that builders should know about.

Built on a standardized file format. The ChatReq format can be produced by any model or software. My LLM stack produces it in both white-box (a Vercel AI SDK wrapper) and black-box contexts (a MITM proxy addon that captures Claude Code's requests to Anthropic). The standardized shape lets a single file format capture and store results from every supported provider and model, and stays lossless in detail via the raw provider request/response fallback the Vercel AI SDK exposes — capture once, analyze (or play back) later.
Streaming visualization. View the progress of a request over time — spot periods of perceived inaction, see the rate of token flow, and feel how tool use or reasoning shapes the user experience.
Inspect the request details. Captures every streaming event in its raw form and renders it back, so you can skim hundreds of events quickly or dig deep into the available data.

Loading trace…

Streaming as first class. Timing matters — both for qualitative reasons (perceived user experience) and, in more complex scenarios like multi-agent, for concrete ones (e.g. concurrency).
Generic where possible, custom as needed. A standard representation of outputs across all LLMs and providers is ideal, and to a degree achievable around the core — tokens, streaming output, JSON tool calls. But the field is not standardized enough yet; to work with the cutting edge of models, this stack embraces custom logic for each model and provider as needed.
Visualization as the starting point for understanding. Rather than leading with summary metrics, visualization lets you understand what is happening while keeping full detail and accommodating the non-uniform state of the field. It also surfaces things you didn't know to look for — for instance, gpt-5.4-nano regularly holds a streaming connection open after token generation is complete, sometimes for a substantial time. That's wildly obvious in the graph, but not something I would have thought to check.

gpt-5.4-nano · ~10k-token input · streaming · 1/5

Loading trace…

The same request to gpt-5.4-nano, five times.