Skip to main content

Overview

A trace is the complete record of a single agent run - the task it was given, the full message trajectory (every user message, assistant response, and tool call), which memories were retrieved, and which model was used. Think of it as a structured log entry that Reflect can learn from. A review is a pass/fail judgment on a trace. When you review a trace, Reflect:
  1. Reads the trajectory, the outcome, and your feedback
  2. Generates a concise reflection (an LLM-produced summary of what worked or went wrong)
  3. Embeds the reflection and stores it as a new memory with an initial utility of 0.5
  4. Updates the utility scores of the memories that were retrieved during that run (up for pass, down for fail)
Without reviews, Reflect is just a trace logger. Reviews are what close the learning loop - they’re the training signal that makes memory retrieval improve over time.

Why traces capture the full trajectory

Reflect stores the entire conversation, not just the final answer, because the reflection LLM needs context to generate useful advice. A reflection like “always verify the order exists before processing a return” can only be generated if the trajectory shows that the agent didn’t verify the order. The final answer alone wouldn’t reveal that. The trajectory also enables the dashboard to show step-by-step replays, which is useful for debugging and manual review.

Why reviews are separate from traces

Reviews can be submitted inline (at trace creation time) or deferred (later, via the API or dashboard). This separation exists because:
  • Automated pipelines know the answer immediately (e.g., comparing against a gold answer) and can submit inline reviews
  • Human review workflows need to collect the trace first and review asynchronously
  • Batch evaluation collects many traces and reviews them all at once
Both paths produce the same result: a reflection is generated, a memory is created, and utility scores are updated.

Three ways to record traces

The SDK provides three patterns for recording traces. They all produce the same result - a trace stored in Reflect - but differ in how much boilerplate they handle for you.

Pattern 1: Context manager

The context manager retrieves memories on entry and auto-submits the trace on exit. It tracks retrieved_memory_ids for you, so the utility learning loop works automatically. Best for: multi-step workflows, streaming, cases where you need to inspect output before deciding the review result.
with client.trace("Parse the CSV and return the top 5 rows") as ctx:
    # ctx.augmented_task - the task with relevant memories appended
    # ctx.memories - the retrieved Memory objects
    response = my_agent(ctx.augmented_task)

    ctx.set_output(
        trajectory=[
            {"role": "user", "content": ctx.augmented_task},
            {"role": "assistant", "content": response},
        ],
        result="pass",
        model="gpt-5.4-mini",
    )
# Trace auto-submitted on exit with correct retrieved_memory_ids
# ctx.trace_id is now available for deferred review or logging

Context manager parameters

ParameterTypeDefaultDescription
taskstrrequiredTask description for memory retrieval and trace logging
limitint10Maximum memories to retrieve
lambda_float0.5Blend between similarity and utility
blockingboolFalseWait for memory creation before exiting the context
auto_fail_on_exceptionboolTrueAuto-submit with result="fail" on unhandled exceptions

ctx.trace_id

After the with block exits, ctx.trace_id contains the ID of the submitted trace. This is useful for deferred reviews — pass it to client.review_trace() later in your application. Inside the with block (before submission), trace_id is None.

set_output parameters

ParameterTypeDefaultDescription
trajectorylist[dict] | strrequiredThe conversation messages
final_responsestr | NoneNoneAgent’s final answer (extracted from trajectory if omitted)
resultstr | NoneNone"pass" or "fail" - omit to defer the review
feedback_textstr | NoneNoneWhat went wrong (used when result="fail")
modelstr | NoneNoneModel name for dashboard display
metadatadict | NoneNoneArbitrary JSON metadata

Blocking mode

Pass blocking=True to wait for the reflection and memory to be created before the with block exits. Useful in evaluation loops where the next task needs to retrieve the memory from the previous one.
with client.trace("...", blocking=True) as ctx:
    response = my_agent(ctx.augmented_task)
    ctx.set_output(trajectory=messages, result="pass")
# Memory is guaranteed to exist here - the next task can retrieve it

Exception handling

If an unhandled exception occurs after set_output was called, the trace is auto-submitted with result="fail" and the exception message as feedback. This prevents losing trace data on crashes. Disable with auto_fail_on_exception=False.

Async variant

async with client.trace_async("Debug login timeout") as ctx:
    response = await my_llm(ctx.augmented_task)
    ctx.set_output(trajectory=messages, result="pass")

Pattern 2: @reflect_trace decorator

The decorator wraps a function so that memory retrieval, trace submission, and retrieved_memory_ids tracking happen automatically. You just write the agent logic. Best for: single-function agents, clean integration with existing function signatures, when you want the least boilerplate.
from reflect_sdk import TraceContext, TraceResult, reflect_trace

@reflect_trace(client, task=lambda question: question)
def answer(ctx: TraceContext, question: str) -> TraceResult:
    messages = [{"role": "user", "content": ctx.augmented_task}]
    response = my_llm(messages)
    messages.append({"role": "assistant", "content": response})
    return TraceResult(
        output=response,       # returned to the caller
        trajectory=messages,
        result="pass",
        model="gpt-5.4-mini",
    )

# Calling the function queries memories, runs your code, and submits the trace
result = answer("Parse the CSV and return the top 5 rows")
# result == response (the output from TraceResult)

Decorator parameters

ParameterTypeDefaultDescription
clientReflectClientrequiredThe client instance
taskstr | Callable | NoneNoneHow to derive the task string - static string, callable on the function’s args, or None to use the first positional arg
limitint10Maximum memories to retrieve
lambda_float0.5Blend between similarity and utility
blockingboolFalseWait for memory creation before returning
auto_fail_on_exceptionboolTrueAuto-submit "fail" on unhandled exceptions
inject_contextboolTruePrepend a TraceContext as the first argument

Two return types

@reflect_trace(client, task=lambda question: question)
def answer(ctx: TraceContext, question: str) -> TraceResult:
    messages = [{"role": "user", "content": ctx.augmented_task}]
    response = my_llm(messages)
    messages.append({"role": "assistant", "content": response})
    return TraceResult(
        output=response,
        trajectory=messages,
        result="pass",
        model="gpt-5.4-mini",
    )
Return a TraceResult when you need to provide the full trajectory, review result, feedback, model, or metadata. Return a plain string for quick prototyping - the string is used as both the trajectory and the final response.

Async support

Async functions are detected automatically:
@reflect_trace(client, task=lambda question: question)
async def answer(ctx: TraceContext, question: str) -> TraceResult:
    ...

Pattern 3: Explicit API calls

Call augment_with_memories and create_trace directly. This gives you full control over every step but requires you to pass retrieved_memory_ids manually. Best for: existing codebases where you can’t wrap the agent function, batch pipelines, cases where traces are created far from where memories are retrieved.
augmented = client.augment_with_memories(
    task="Parse the CSV and return the top 5 rows",
)
response = my_agent(augmented.augmented_task)

submission = client.create_trace(
    task="Parse the CSV and return the top 5 rows",
    trajectory=[
        {"role": "user", "content": augmented.augmented_task},
        {"role": "assistant", "content": response},
    ],
    retrieved_memory_ids=[m.id for m in augmented.memories],
    model="gpt-5.4-mini",
    review_result="pass",
)
When using create_trace directly, you must pass retrieved_memory_ids manually. If you forget, the utility learning loop breaks silently - memories won’t be reinforced or penalized based on outcomes. The context manager and decorator handle this automatically.

create_trace vs create_trace_and_wait

MethodBehaviorUse when
create_traceReturns immediately with a TraceSubmission. Processing happens in the background.Your app serves real-time traffic and you don’t want to block
create_trace_and_waitBlocks until the trace is fully processed and the memory is created.Evaluation loops, tests, scripts where the next step needs the memory to exist
# Non-blocking - returns immediately
submission = client.create_trace(task="...", trajectory=[...], review_result="pass")
# submission.ingest_status == "queued"

# Blocking - waits for the memory to be created
trace = client.create_trace_and_wait(
    task="...",
    trajectory=[...],
    review_result="pass",
    poll_interval=0.25,    # seconds between polls
    wait_timeout=60.0,     # max seconds to wait
)
# trace.review_status == "reviewed"
# trace.created_memory_id is set

Reviews

Inline reviews

Include the review when creating the trace. This is the simplest path - one call does everything.
# Context manager
with client.trace("...") as ctx:
    response = my_agent(ctx.augmented_task)
    ctx.set_output(trajectory=messages, result="pass")

# Decorator
@reflect_trace(client, task=lambda q: q)
def answer(ctx, question):
    ...
    return TraceResult(output=response, trajectory=messages, result="pass")

# Explicit
client.create_trace(task="...", trajectory=[...], review_result="pass")
The SDK accepts "success" / "failure" as aliases for "pass" / "fail".

Deferred reviews

Create the trace without a review, then submit one later. This is useful when:
  • A human needs to evaluate the answer
  • You’re running a batch and want to review traces in one go afterwards
  • The review depends on external feedback that isn’t available yet
# Step 1: Create trace without a review
with client.trace("Summarize the quarterly report") as ctx:
    response = my_agent(ctx.augmented_task)
    ctx.set_output(trajectory=messages)  # no result - review deferred

# ctx.trace_id is available after the with block exits
print(f"Trace submitted: {ctx.trace_id}")

# Step 2: Later, after human evaluation
trace = client.review_trace(
    trace_id=ctx.trace_id,
    result="fail",
    feedback_text="Summary missed the revenue decline in Q3",
)
# trace.review_status == "reviewed"
# trace.created_memory_id is now set
Deferred reviews are processed synchronously - the returned Trace includes the review and the created memory ID. You can also review traces from the Reflect Console.

What makes good feedback

When a trace fails, feedback_text is included in the reflection prompt. Specific feedback produces better reflections:
FeedbackQuality
"Wrong"Too vague - the reflection won’t capture the specific mistake
"The answer was incorrect"Slightly better but still generic
"Missed the WHERE clause in the SQL query, returning all rows instead of filtered"Specific - the reflection will mention the WHERE clause, making it useful for future similar tasks
For passing traces, feedback is optional. The trajectory itself provides enough context for the reflection.

Listing and fetching traces

# List traces by review status
pending = client.list_traces(review_status="pending")
reviewed = client.list_traces(review_status="reviewed")
all_traces = client.list_traces()

# Fetch a specific trace
trace = client.get_trace(trace_id="abc-123")
Each Trace object includes:
FieldDescription
idUnique trace identifier
taskThe task that was executed
trajectoryList of message dicts
review_status"pending" or "reviewed"
ingest_status"queued", "processing", "completed", or "failed"
created_memory_idID of the reflection memory (set after review)
reviewAttached Review object (if reviewed)

Choosing a pattern

Context manager

client.trace() — Auto-tracks retrieved_memory_ids. Flexible - inspect output before deciding the review. Supports blocking mode for eval loops. Best for multi-step agents, streaming, conditional review logic.

Decorator

@reflect_trace — Least boilerplate. Wraps a single function. Supports sync and async. Return TraceResult for full control or a string for quick prototyping. Best for single-function agents and clean codebases.

Explicit calls

create_trace / create_trace_and_wait — Full control over every step. Must pass retrieved_memory_ids manually. Best for existing codebases, batch pipelines, and cases where traces are created separately from memory retrieval.