Traces and reviews

Overview

A trace is the complete record of a single agent run - the task it was given, the full message trajectory (every user message, assistant response, and tool call), which memories were retrieved, and which model was used. Think of it as a structured log entry that Reflect can learn from. A review is a pass/fail judgment on a trace. When you review a trace, Reflect:

Reads the trajectory, the outcome, and your feedback
Generates a concise reflection (an LLM-produced summary of what worked or went wrong)
Embeds the reflection and stores it as a new memory with an initial utility of 0.5
Updates the utility scores of the memories that were retrieved during that run (up for pass, down for fail)

Without reviews, Reflect is just a trace logger. Reviews are what close the learning loop - they’re the training signal that makes memory retrieval improve over time.

Why traces capture the full trajectory

Reflect stores the entire conversation, not just the final answer, because the reflection LLM needs context to generate useful advice. A reflection like “always verify the order exists before processing a return” can only be generated if the trajectory shows that the agent didn’t verify the order. The final answer alone wouldn’t reveal that. The trajectory also enables the dashboard to show step-by-step replays, which is useful for debugging and manual review.

Why reviews are separate from traces

Reviews can be submitted inline (at trace creation time) or deferred (later, via the API or dashboard). This separation exists because:

Automated pipelines know the answer immediately (e.g., comparing against a gold answer) and can submit inline reviews
Human review workflows need to collect the trace first and review asynchronously
Batch evaluation collects many traces and reviews them all at once

Both paths produce the same result: a reflection is generated, a memory is created, and utility scores are updated.

Three ways to record traces

The SDK provides three patterns for recording traces. They all produce the same result - a trace stored in Reflect - but differ in how much boilerplate they handle for you.

Pattern 1: Context manager

The context manager retrieves memories on entry and auto-submits the trace on exit. It tracks retrieved_memory_ids for you, so the utility learning loop works automatically. Best for: multi-step workflows, streaming, cases where you need to inspect output before deciding the review result.

with client.trace("Parse the CSV and return the top 5 rows") as ctx:
    # ctx.augmented_task - the task with relevant memories appended
    # ctx.memories - the retrieved Memory objects
    response = my_agent(ctx.augmented_task)

    ctx.set_output(
        trajectory=[
            {"role": "user", "content": ctx.augmented_task},
            {"role": "assistant", "content": response},
        ],
        result="pass",
        model="gpt-5.4-mini",
    )
# Trace auto-submitted on exit with correct retrieved_memory_ids
# ctx.trace_id is now available for deferred review or logging

Context manager parameters

Parameter	Type	Default	Description
`task`	`str`	required	Task description for memory retrieval and trace logging
`limit`	`int`	`10`	Maximum memories to retrieve
`lambda_`	`float`	`0.5`	Blend between similarity and utility
`mmr_lambda`	`float`	`0.7`	MMR diversity weight applied after the utility blend. `1.0` disables diversity. See the memories guide.
`blocking`	`bool`	`False`	Wait for memory creation before exiting the context
`auto_fail_on_exception`	`bool`	`True`	Auto-submit with `result="fail"` on unhandled exceptions

`ctx.trace_id`

After the with block exits, ctx.trace_id contains the ID of the submitted trace. This is useful for deferred reviews — pass it to client.review_trace() later in your application. Inside the with block (before submission), trace_id is None.

`set_output` parameters

Parameter	Type	Default	Description
`trajectory`	`list[dict] \| str`	required	The conversation messages
`final_response`	`str \| None`	`None`	Agent’s final answer (extracted from trajectory if omitted)
`result`	`str \| None`	`None`	`"pass"` or `"fail"` - omit to defer the review
`feedback_text`	`str \| None`	`None`	What went wrong (used when `result="fail"`)
`model`	`str \| None`	`None`	Model name for dashboard display
`metadata`	`dict \| None`	`None`	Arbitrary JSON metadata

Blocking mode

Pass blocking=True to wait for the reflection and memory to be created before the with block exits. Useful in evaluation loops where the next task needs to retrieve the memory from the previous one.

with client.trace("...", blocking=True) as ctx:
    response = my_agent(ctx.augmented_task)
    ctx.set_output(trajectory=messages, result="pass")
# Memory is guaranteed to exist here - the next task can retrieve it

Exception handling

If an unhandled exception occurs after set_output was called, the trace is auto-submitted with result="fail" and the exception message as feedback. This prevents losing trace data on crashes. Disable with auto_fail_on_exception=False.

Async variant

async with client.trace_async("Debug login timeout") as ctx:
    response = await my_llm(ctx.augmented_task)
    ctx.set_output(trajectory=messages, result="pass")

Pattern 2: `@reflect_trace` decorator

The decorator wraps a function so that memory retrieval, trace submission, and retrieved_memory_ids tracking happen automatically. You just write the agent logic. Best for: single-function agents, clean integration with existing function signatures, when you want the least boilerplate.

from reflect_sdk import TraceContext, TraceResult, reflect_trace

@reflect_trace(client, task=lambda question: question)
def answer(ctx: TraceContext, question: str) -> TraceResult:
    messages = [{"role": "user", "content": ctx.augmented_task}]
    response = my_llm(messages)
    messages.append({"role": "assistant", "content": response})
    return TraceResult(
        output=response,       # returned to the caller
        trajectory=messages,
        result="pass",
        model="gpt-5.4-mini",
    )

# Calling the function queries memories, runs your code, and submits the trace
result = answer("Parse the CSV and return the top 5 rows")
# result == response (the output from TraceResult)

Decorator parameters

Parameter	Type	Default	Description
`client`	`ReflectClient`	required	The client instance
`task`	`str \| Callable \| None`	`None`	How to derive the task string - static string, callable on the function’s args, or `None` to use the first positional arg
`limit`	`int`	`10`	Maximum memories to retrieve
`lambda_`	`float`	`0.5`	Blend between similarity and utility
`mmr_lambda`	`float`	`0.7`	MMR diversity weight applied after the utility blend. `1.0` disables diversity. See the memories guide.
`blocking`	`bool`	`False`	Wait for memory creation before returning
`auto_fail_on_exception`	`bool`	`True`	Auto-submit `"fail"` on unhandled exceptions
`inject_context`	`bool`	`True`	Prepend a `TraceContext` as the first argument

Two return types

@reflect_trace(client, task=lambda question: question)
def answer(ctx: TraceContext, question: str) -> TraceResult:
    messages = [{"role": "user", "content": ctx.augmented_task}]
    response = my_llm(messages)
    messages.append({"role": "assistant", "content": response})
    return TraceResult(
        output=response,
        trajectory=messages,
        result="pass",
        model="gpt-5.4-mini",
    )

Return a TraceResult when you need to provide the full trajectory, review result, feedback, model, or metadata. Return a plain string for quick prototyping - the string is used as both the trajectory and the final response.

Async support

Async functions are detected automatically:

@reflect_trace(client, task=lambda question: question)
async def answer(ctx: TraceContext, question: str) -> TraceResult:
    ...

Pattern 3: Explicit API calls

Call augment_with_memories and create_trace directly. This gives you full control over every step but requires you to pass retrieved_memory_ids manually. Best for: existing codebases where you can’t wrap the agent function, batch pipelines, cases where traces are created far from where memories are retrieved.

augmented = client.augment_with_memories(
    task="Parse the CSV and return the top 5 rows",
)
response = my_agent(augmented.augmented_task)

submission = client.create_trace(
    task="Parse the CSV and return the top 5 rows",
    trajectory=[
        {"role": "user", "content": augmented.augmented_task},
        {"role": "assistant", "content": response},
    ],
    retrieved_memory_ids=[m.id for m in augmented.memories],
    model="gpt-5.4-mini",
    review_result="pass",
)

When using create_trace directly, you must pass retrieved_memory_ids manually. If you forget, the utility learning loop breaks silently - memories won’t be reinforced or penalized based on outcomes. The context manager and decorator handle this automatically.

`create_trace` vs `create_trace_and_wait`

Method	Behavior	Use when
`create_trace`	Returns immediately with a `TraceSubmission`. Processing happens in the background.	Your app serves real-time traffic and you don’t want to block
`create_trace_and_wait`	Blocks until the trace is fully processed and the memory is created.	Evaluation loops, tests, scripts where the next step needs the memory to exist

# Non-blocking - returns immediately
submission = client.create_trace(task="...", trajectory=[...], review_result="pass")
# submission.ingest_status == "queued"

# Blocking - waits for the memory to be created
trace = client.create_trace_and_wait(
    task="...",
    trajectory=[...],
    review_result="pass",
    poll_interval=0.25,    # seconds between polls
    wait_timeout=60.0,     # max seconds to wait
)
# trace.review_status == "reviewed"
# trace.created_memory_id is set

Reviews

Inline reviews

Include the review when creating the trace. This is the simplest path - one call does everything.

# Context manager
with client.trace("...") as ctx:
    response = my_agent(ctx.augmented_task)
    ctx.set_output(trajectory=messages, result="pass")

# Decorator
@reflect_trace(client, task=lambda q: q)
def answer(ctx, question):
    ...
    return TraceResult(output=response, trajectory=messages, result="pass")

# Explicit
client.create_trace(task="...", trajectory=[...], review_result="pass")

The SDK accepts "success" / "failure" as aliases for "pass" / "fail".

Deferred reviews

Create the trace without a review, then submit one later. This is useful when:

A human needs to evaluate the answer
You’re running a batch and want to review traces in one go afterwards
The review depends on external feedback that isn’t available yet

# Step 1: Create trace without a review
with client.trace("Summarize the quarterly report") as ctx:
    response = my_agent(ctx.augmented_task)
    ctx.set_output(trajectory=messages)  # no result - review deferred

# ctx.trace_id is available after the with block exits
print(f"Trace submitted: {ctx.trace_id}")

# Step 2: Later, after human evaluation
trace = client.review_trace(
    trace_id=ctx.trace_id,
    result="fail",
    feedback_text="Summary missed the revenue decline in Q3",
)
# trace.review_status == "reviewed"
# trace.created_memory_id is now set

Deferred reviews are processed synchronously - the returned Trace includes the review and the created memory ID. You can also review traces from the Reflect Console.

What makes good feedback

When a trace fails, feedback_text is included in the reflection prompt. Specific feedback produces better reflections:

Feedback	Quality
`"Wrong"`	Too vague - the reflection won’t capture the specific mistake
`"The answer was incorrect"`	Slightly better but still generic
`"Missed the WHERE clause in the SQL query, returning all rows instead of filtered"`	Specific - the reflection will mention the WHERE clause, making it useful for future similar tasks

For passing traces, feedback is optional. The trajectory itself provides enough context for the reflection.

Listing and fetching traces

# List traces by review status
pending = client.list_traces(review_status="pending")
reviewed = client.list_traces(review_status="reviewed")
all_traces = client.list_traces()

# Fetch a specific trace
trace = client.get_trace(trace_id="abc-123")

Each Trace object includes:

Field	Description
`id`	Unique trace identifier
`task`	The task that was executed
`trajectory`	List of message dicts
`review_status`	`"pending"` or `"reviewed"`
`ingest_status`	`"queued"`, `"processing"`, `"completed"`, or `"failed"`
`created_memory_id`	ID of the reflection memory (set after review)
`review`	Attached `Review` object (if reviewed)

Choosing a pattern

Context manager

client.trace() — Auto-tracks retrieved_memory_ids. Flexible - inspect output before deciding the review. Supports blocking mode for eval loops. Best for multi-step agents, streaming, conditional review logic.

Decorator

@reflect_trace — Least boilerplate. Wraps a single function. Supports sync and async. Return TraceResult for full control or a string for quick prototyping. Best for single-function agents and clean codebases.

Explicit calls

create_trace / create_trace_and_wait — Full control over every step. Must pass retrieved_memory_ids manually. Best for existing codebases, batch pipelines, and cases where traces are created separately from memory retrieval.

Getting started

Core Concepts

Examples

Roadmap

Traces and reviews

Overview

Why traces capture the full trajectory

Why reviews are separate from traces

Three ways to record traces

Pattern 1: Context manager

Context manager parameters

`ctx.trace_id`

`set_output` parameters

Blocking mode

Exception handling

Async variant

Pattern 2: `@reflect_trace` decorator

Decorator parameters

Two return types

Async support

Pattern 3: Explicit API calls

`create_trace` vs `create_trace_and_wait`

Reviews

Inline reviews

Deferred reviews

What makes good feedback

Listing and fetching traces

Choosing a pattern

Context manager

Decorator

Explicit calls

Getting started

Core Concepts

Examples

Roadmap

Documentation Index

​Overview

​Why traces capture the full trajectory

​Why reviews are separate from traces

​Three ways to record traces

​Pattern 1: Context manager

​Context manager parameters

​ctx.trace_id

​set_output parameters

​Blocking mode

​Exception handling

​Async variant

​Pattern 2: @reflect_trace decorator

​Decorator parameters

​Two return types

​Async support

​Pattern 3: Explicit API calls

​create_trace vs create_trace_and_wait

​Reviews

​Inline reviews

​Deferred reviews

​What makes good feedback

​Listing and fetching traces

​Choosing a pattern

Context manager

Decorator

Explicit calls

Overview

Why traces capture the full trajectory

Why reviews are separate from traces

Three ways to record traces

Pattern 1: Context manager

Context manager parameters

`ctx.trace_id`

`set_output` parameters

Blocking mode

Exception handling

Async variant

Pattern 2: `@reflect_trace` decorator

Decorator parameters

Two return types

Async support

Pattern 3: Explicit API calls

`create_trace` vs `create_trace_and_wait`

Reviews

Inline reviews

Deferred reviews

What makes good feedback

Listing and fetching traces

Choosing a pattern