Analyze Traces
You need to see exactly what the agent did so you can debug failures
and verify improvements. fit-trace reads the NDJSON
traces produced by fit-eval and gives you structured
queries over every turn, tool call, and result.
Prerequisites
- Node.js 18+
-
A trace file -- either
--outputfrom afit-evalrun, or downloaded from CI withfit-trace download
Get the trace
Local runs already produce a trace at the
--output path. For CI runs, list recent workflow runs
and download:
npx fit-trace runs # list recent workflow runs
npx fit-trace download 24497273755 # downloads to /tmp/trace-24497273755/
The download produces trace.ndjson and
structured.json. Both formats work as input to every
query command below.
Orient with the overview
Start with the bird's-eye view before drilling into individual turns:
npx fit-trace overview /tmp/trace-24497273755/structured.json
{
"summary": { "result": "success", "totalCostUsd": 0.42, "numTurns": 18 },
"turnCount": 34,
"tools": [{ "tool": "Bash", "count": 12 }, { "tool": "Read", "count": 8 }],
"taskPrompt": "Refactor src/utils/format.js so that formatDate and formatCurrency share..."
}
The timeline command shows the shape of the session at
a glance -- one line per assistant turn with tools used and token
counts:
npx fit-trace timeline /tmp/trace-24497273755/structured.json
[1] Read in:12.3K out:0.8K Let me read the current implementation...
[3] Bash in:13.1K out:1.2K Running the existing tests first...
[5] Edit in:14.0K out:2.1K I'll extract the shared locale helper...
[7] Bash in:15.2K out:0.4K Running tests to verify the refactor...
Find errors
List every tool result where the agent's tool call failed:
npx fit-trace errors /tmp/trace-24497273755/structured.json
Each result includes the turn index, the toolUseId that
links it back to the assistant turn that made the call, and the
error content.
Filter by tool or role
See every turn where the agent used a specific tool, including both
the tool_use request and its
tool_result response:
npx fit-trace tool /tmp/trace-24497273755/structured.json Bash
Or use filter for structural queries -- by role, tool
name, or error status:
npx fit-trace filter /tmp/trace-24497273755/structured.json --tool Edit
npx fit-trace filter /tmp/trace-24497273755/structured.json --error
npx fit-trace filter /tmp/trace-24497273755/structured.json --role user
Search across the trace
Search all turn content with a regex pattern:
npx fit-trace search /tmp/trace-24497273755/structured.json 'permission denied' --context 1
--context 1 includes one surrounding turn on each side
of every match. --limit 10 caps the number of results.
--full emits the complete content block instead of a
short excerpt.
Read the agent's reasoning
Extract just the text blocks from assistant turns to see what the agent said it would do (as distinct from what its tool calls actually did):
npx fit-trace reasoning /tmp/trace-24497273755/structured.json --from 5 --to 15
[
{ "index": 5, "text": "I'll extract the shared locale helper..." },
{ "index": 9, "text": "Tests pass. Now adding coverage for de-DE..." }
]
Comparing reasoning output to actual
tool calls reveals mismatches between intent and
execution.
Measure token usage and cost
npx fit-trace stats /tmp/trace-24497273755/structured.json
{
"totals": { "inputTokens": 142800, "outputTokens": 18400, "totalCostUsd": 0.42, "durationMs": 94200 },
"perTurn": [{ "index": 1, "inputTokens": 12300, "outputTokens": 800, ... }]
}
Track these numbers across runs over time. A single trace is a snapshot; a series shows whether changes are landing.
Split multi-agent traces
For supervised or facilitated runs, split the combined trace into per-source files so you can see what each agent saw independently:
npx fit-trace split /tmp/trace-24497273755/structured.json --mode=facilitate
This produces trace-facilitator.ndjson,
trace-<participant>.ndjson, and a combined
trace-agent.ndjson in the same directory. Each file
works as input to every query command above.
For supervised runs, use --mode=supervise to get
trace-agent.ndjson and
trace-supervisor.ndjson.
Navigate individual turns
When you need to inspect a specific moment in the trace:
npx fit-trace turn /tmp/trace-24497273755/structured.json 8
npx fit-trace batch /tmp/trace-24497273755/structured.json 5 10
npx fit-trace head /tmp/trace-24497273755/structured.json 5
npx fit-trace tail /tmp/trace-24497273755/structured.json 5
batch returns turns in the half-open range
[from, to). head and
tail default to 10 turns when no count is given.
What to look for
When debugging a failure, a useful sequence is:
-
overview-- did the run succeed or fail? How many turns? errors-- which tool calls failed?-
tool <name>on the failing tool -- what input did the agent send? -
reasoningaround those turns -- did the agent understand the error? -
searchfor the error message -- did it appear earlier than expected?
When verifying an improvement, compare stats across
before-and-after runs. Fewer retries, lower token usage, and shorter
duration are the signals that a profile or prompt change improved
outcomes.
Related
-
Agent Collaboration
-- produce traces with
fit-eval facilitate; the per-sourcesplitis essential for multi-agent traces. -
Agent Evaluations
-- produce traces with
fit-eval supervise; the trace is what you analyze here.