Analyze Traces

You need to see exactly what the agent did so you can debug failures and verify improvements. fit-trace reads the NDJSON traces produced by fit-eval and gives you structured queries over every turn, tool call, and result.

Prerequisites

  • Node.js 18+
  • A trace file -- either --output from a fit-eval run, or downloaded from CI with fit-trace download

Get the trace

Local runs already produce a trace at the --output path. For CI runs, list recent workflow runs and download:

npx fit-trace runs                        # list recent workflow runs
npx fit-trace download 24497273755        # downloads to /tmp/trace-24497273755/

The download produces trace.ndjson and structured.json. Both formats work as input to every query command below.

Orient with the overview

Start with the bird's-eye view before drilling into individual turns:

npx fit-trace overview /tmp/trace-24497273755/structured.json
{
  "summary": { "result": "success", "totalCostUsd": 0.42, "numTurns": 18 },
  "turnCount": 34,
  "tools": [{ "tool": "Bash", "count": 12 }, { "tool": "Read", "count": 8 }],
  "taskPrompt": "Refactor src/utils/format.js so that formatDate and formatCurrency share..."
}

The timeline command shows the shape of the session at a glance -- one line per assistant turn with tools used and token counts:

npx fit-trace timeline /tmp/trace-24497273755/structured.json
[1]  Read                           in:12.3K out:0.8K    Let me read the current implementation...
[3]  Bash                           in:13.1K out:1.2K    Running the existing tests first...
[5]  Edit                           in:14.0K out:2.1K    I'll extract the shared locale helper...
[7]  Bash                           in:15.2K out:0.4K    Running tests to verify the refactor...

Find errors

List every tool result where the agent's tool call failed:

npx fit-trace errors /tmp/trace-24497273755/structured.json

Each result includes the turn index, the toolUseId that links it back to the assistant turn that made the call, and the error content.

Filter by tool or role

See every turn where the agent used a specific tool, including both the tool_use request and its tool_result response:

npx fit-trace tool /tmp/trace-24497273755/structured.json Bash

Or use filter for structural queries -- by role, tool name, or error status:

npx fit-trace filter /tmp/trace-24497273755/structured.json --tool Edit
npx fit-trace filter /tmp/trace-24497273755/structured.json --error
npx fit-trace filter /tmp/trace-24497273755/structured.json --role user

Search across the trace

Search all turn content with a regex pattern:

npx fit-trace search /tmp/trace-24497273755/structured.json 'permission denied' --context 1

--context 1 includes one surrounding turn on each side of every match. --limit 10 caps the number of results. --full emits the complete content block instead of a short excerpt.

Read the agent's reasoning

Extract just the text blocks from assistant turns to see what the agent said it would do (as distinct from what its tool calls actually did):

npx fit-trace reasoning /tmp/trace-24497273755/structured.json --from 5 --to 15
[
  { "index": 5, "text": "I'll extract the shared locale helper..." },
  { "index": 9, "text": "Tests pass. Now adding coverage for de-DE..." }
]

Comparing reasoning output to actual tool calls reveals mismatches between intent and execution.

Measure token usage and cost

npx fit-trace stats /tmp/trace-24497273755/structured.json
{
  "totals": { "inputTokens": 142800, "outputTokens": 18400, "totalCostUsd": 0.42, "durationMs": 94200 },
  "perTurn": [{ "index": 1, "inputTokens": 12300, "outputTokens": 800, ... }]
}

Track these numbers across runs over time. A single trace is a snapshot; a series shows whether changes are landing.

Split multi-agent traces

For supervised or facilitated runs, split the combined trace into per-source files so you can see what each agent saw independently:

npx fit-trace split /tmp/trace-24497273755/structured.json --mode=facilitate

This produces trace-facilitator.ndjson, trace-<participant>.ndjson, and a combined trace-agent.ndjson in the same directory. Each file works as input to every query command above.

For supervised runs, use --mode=supervise to get trace-agent.ndjson and trace-supervisor.ndjson.

When you need to inspect a specific moment in the trace:

npx fit-trace turn /tmp/trace-24497273755/structured.json 8
npx fit-trace batch /tmp/trace-24497273755/structured.json 5 10
npx fit-trace head /tmp/trace-24497273755/structured.json 5
npx fit-trace tail /tmp/trace-24497273755/structured.json 5

batch returns turns in the half-open range [from, to). head and tail default to 10 turns when no count is given.

What to look for

When debugging a failure, a useful sequence is:

  1. overview -- did the run succeed or fail? How many turns?
  2. errors -- which tool calls failed?
  3. tool <name> on the failing tool -- what input did the agent send?
  4. reasoning around those turns -- did the agent understand the error?
  5. search for the error message -- did it appear earlier than expected?

When verifying an improvement, compare stats across before-and-after runs. Fewer retries, lower token usage, and shorter duration are the signals that a profile or prompt change improved outcomes.

  • Agent Collaboration -- produce traces with fit-eval facilitate; the per-source split is essential for multi-agent traces.
  • Agent Evaluations -- produce traces with fit-eval supervise; the trace is what you analyze here.