> ## Documentation Index
> Fetch the complete documentation index at: https://docs.starlight-search.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Traces and reviews

> What traces and reviews are, how to record them using the three SDK patterns, and how reviews drive the learning loop.

## Overview

A **trace** is the complete record of a single agent run - the task it was given, the full message trajectory (every user message, assistant response, and tool call), which memories were retrieved, and which model was used. Think of it as a structured log entry that Reflect can learn from.

A **review** is a pass/fail judgment on a trace. When you review a trace, Reflect:

1. Reads the trajectory, the outcome, and your feedback
2. Generates a concise **reflection** (an LLM-produced summary of what worked or went wrong)
3. Embeds the reflection and stores it as a new **memory** with an initial utility of 0.5
4. Updates the utility scores of the memories that were retrieved during that run (up for pass, down for fail)

Without reviews, Reflect is just a trace logger. Reviews are what close the learning loop - they're the training signal that makes memory retrieval improve over time.

<Note>
  Reviewing a trace is one of two ways to create a memory. If your agent already knows the lesson, it can author the reflection itself and skip the trajectory entirely - see [agent-authored reflections](/guides/memories#how-memories-are-created) and the MCP [`create_memory`](/guides/mcp#create-memory) tool.
</Note>

### Why traces capture the full trajectory

Reflect stores the entire conversation, not just the final answer, because the reflection LLM needs context to generate useful advice. A reflection like "always verify the order exists before processing a return" can only be generated if the trajectory shows that the agent *didn't* verify the order. The final answer alone wouldn't reveal that.

The trajectory also enables the dashboard to show step-by-step replays, which is useful for debugging and manual review.

### Why reviews are separate from traces

Reviews can be submitted inline (at trace creation time) or deferred (later, via the API or dashboard). This separation exists because:

* **Automated pipelines** know the answer immediately (e.g., comparing against a gold answer) and can submit inline reviews
* **Human review workflows** need to collect the trace first and review asynchronously
* **Batch evaluation** collects many traces and reviews them all at once

Both paths produce the same result: a reflection is generated, a memory is created, and utility scores are updated.

## Three ways to record traces

The SDK provides three patterns for recording traces. They all produce the same result - a trace stored in Reflect - but differ in how much boilerplate they handle for you.

### Pattern 1: Context manager

The context manager retrieves memories on entry and auto-submits the trace on exit. It tracks `retrieved_memory_ids` for you, so the utility learning loop works automatically.

**Best for:** multi-step workflows, streaming, cases where you need to inspect output before deciding the review result.

```python theme={null}
with client.trace("Parse the CSV and return the top 5 rows") as ctx:
    # ctx.augmented_task - the task with relevant memories appended
    # ctx.memories - the retrieved Memory objects
    response = my_agent(ctx.augmented_task)

    ctx.set_output(
        trajectory=[
            {"role": "user", "content": ctx.augmented_task},
            {"role": "assistant", "content": response},
        ],
        result="pass",
        model="gpt-5.4-mini",
    )
# Trace auto-submitted on exit with correct retrieved_memory_ids
# ctx.trace_id is now available for deferred review or logging
```

#### Context manager parameters

| Parameter                | Type    | Default  | Description                                                                                                                                                     |
| ------------------------ | ------- | -------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `task`                   | `str`   | required | Task description for memory retrieval and trace logging                                                                                                         |
| `limit`                  | `int`   | `10`     | Maximum memories to retrieve                                                                                                                                    |
| `lambda_`                | `float` | `0.5`    | Blend between similarity and utility                                                                                                                            |
| `mmr_lambda`             | `float` | `0.7`    | MMR diversity weight applied after the utility blend. `1.0` disables diversity. See the [memories guide](./memories#diversity-aware-retrieval-with-mmr_lambda). |
| `blocking`               | `bool`  | `False`  | Wait for memory creation before exiting the context                                                                                                             |
| `auto_fail_on_exception` | `bool`  | `True`   | Auto-submit with `result="fail"` on unhandled exceptions                                                                                                        |

#### `ctx.trace_id`

After the `with` block exits, `ctx.trace_id` contains the ID of the submitted trace. This is useful for deferred reviews — pass it to `client.review_trace()` later in your application. Inside the `with` block (before submission), `trace_id` is `None`.

#### `set_output` parameters

| Parameter        | Type                | Default  | Description                                                 |
| ---------------- | ------------------- | -------- | ----------------------------------------------------------- |
| `trajectory`     | `list[dict] \| str` | required | The conversation messages                                   |
| `final_response` | `str \| None`       | `None`   | Agent's final answer (extracted from trajectory if omitted) |
| `result`         | `str \| None`       | `None`   | `"pass"` or `"fail"` - omit to defer the review             |
| `feedback_text`  | `str \| None`       | `None`   | What went wrong (used when `result="fail"`)                 |
| `model`          | `str \| None`       | `None`   | Model name for dashboard display                            |
| `metadata`       | `dict \| None`      | `None`   | Arbitrary JSON metadata                                     |

#### Blocking mode

Pass `blocking=True` to wait for the reflection and memory to be created before the `with` block exits. Useful in evaluation loops where the next task needs to retrieve the memory from the previous one.

```python theme={null}
with client.trace("...", blocking=True) as ctx:
    response = my_agent(ctx.augmented_task)
    ctx.set_output(trajectory=messages, result="pass")
# Memory is guaranteed to exist here - the next task can retrieve it
```

#### Exception handling

If an unhandled exception occurs after `set_output` was called, the trace is auto-submitted with `result="fail"` and the exception message as feedback. This prevents losing trace data on crashes. Disable with `auto_fail_on_exception=False`.

#### Async variant

```python theme={null}
async with client.trace_async("Debug login timeout") as ctx:
    response = await my_llm(ctx.augmented_task)
    ctx.set_output(trajectory=messages, result="pass")
```

### Pattern 2: `@reflect_trace` decorator

The decorator wraps a function so that memory retrieval, trace submission, and `retrieved_memory_ids` tracking happen automatically. You just write the agent logic.

**Best for:** single-function agents, clean integration with existing function signatures, when you want the least boilerplate.

```python theme={null}
from reflect_sdk import TraceContext, TraceResult, reflect_trace

@reflect_trace(client, task=lambda question: question)
def answer(ctx: TraceContext, question: str) -> TraceResult:
    messages = [{"role": "user", "content": ctx.augmented_task}]
    response = my_llm(messages)
    messages.append({"role": "assistant", "content": response})
    return TraceResult(
        output=response,       # returned to the caller
        trajectory=messages,
        result="pass",
        model="gpt-5.4-mini",
    )

# Calling the function queries memories, runs your code, and submits the trace
result = answer("Parse the CSV and return the top 5 rows")
# result == response (the output from TraceResult)
```

#### Decorator parameters

| Parameter                | Type                      | Default  | Description                                                                                                                                                     |
| ------------------------ | ------------------------- | -------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `client`                 | `ReflectClient`           | required | The client instance                                                                                                                                             |
| `task`                   | `str \| Callable \| None` | `None`   | How to derive the task string - static string, callable on the function's args, or `None` to use the first positional arg                                       |
| `limit`                  | `int`                     | `10`     | Maximum memories to retrieve                                                                                                                                    |
| `lambda_`                | `float`                   | `0.5`    | Blend between similarity and utility                                                                                                                            |
| `mmr_lambda`             | `float`                   | `0.7`    | MMR diversity weight applied after the utility blend. `1.0` disables diversity. See the [memories guide](./memories#diversity-aware-retrieval-with-mmr_lambda). |
| `blocking`               | `bool`                    | `False`  | Wait for memory creation before returning                                                                                                                       |
| `auto_fail_on_exception` | `bool`                    | `True`   | Auto-submit `"fail"` on unhandled exceptions                                                                                                                    |
| `inject_context`         | `bool`                    | `True`   | Prepend a `TraceContext` as the first argument                                                                                                                  |

#### Two return types

<CodeGroup>
  ```python TraceResult (full control) theme={null}
  @reflect_trace(client, task=lambda question: question)
  def answer(ctx: TraceContext, question: str) -> TraceResult:
      messages = [{"role": "user", "content": ctx.augmented_task}]
      response = my_llm(messages)
      messages.append({"role": "assistant", "content": response})
      return TraceResult(
          output=response,
          trajectory=messages,
          result="pass",
          model="gpt-5.4-mini",
      )
  ```

  ```python String (minimal) theme={null}
  @reflect_trace(client, task=lambda question: question)
  def answer(ctx: TraceContext, question: str) -> str:
      response = my_llm(ctx.augmented_task)
      return response  # used as both trajectory and final_response
  ```
</CodeGroup>

Return a `TraceResult` when you need to provide the full trajectory, review result, feedback, model, or metadata. Return a plain string for quick prototyping - the string is used as both the trajectory and the final response.

#### Async support

Async functions are detected automatically:

```python theme={null}
@reflect_trace(client, task=lambda question: question)
async def answer(ctx: TraceContext, question: str) -> TraceResult:
    ...
```

### Pattern 3: Explicit API calls

Call `augment_with_memories` and `create_trace` directly. This gives you full control over every step but requires you to pass `retrieved_memory_ids` manually.

**Best for:** existing codebases where you can't wrap the agent function, batch pipelines, cases where traces are created far from where memories are retrieved.

```python theme={null}
augmented = client.augment_with_memories(
    task="Parse the CSV and return the top 5 rows",
)
response = my_agent(augmented.augmented_task)

submission = client.create_trace(
    task="Parse the CSV and return the top 5 rows",
    trajectory=[
        {"role": "user", "content": augmented.augmented_task},
        {"role": "assistant", "content": response},
    ],
    retrieved_memory_ids=[m.id for m in augmented.memories],
    model="gpt-5.4-mini",
    review_result="pass",
)
```

<Warning>
  When using `create_trace` directly, you **must** pass `retrieved_memory_ids` manually. If you forget, the utility learning loop breaks silently - memories won't be reinforced or penalized based on outcomes. The context manager and decorator handle this automatically.
</Warning>

#### `create_trace` vs `create_trace_and_wait`

| Method                  | Behavior                                                                            | Use when                                                                       |
| ----------------------- | ----------------------------------------------------------------------------------- | ------------------------------------------------------------------------------ |
| `create_trace`          | Returns immediately with a `TraceSubmission`. Processing happens in the background. | Your app serves real-time traffic and you don't want to block                  |
| `create_trace_and_wait` | Blocks until the trace is fully processed and the memory is created.                | Evaluation loops, tests, scripts where the next step needs the memory to exist |

```python theme={null}
# Non-blocking - returns immediately
submission = client.create_trace(task="...", trajectory=[...], review_result="pass")
# submission.ingest_status == "queued"

# Blocking - waits for the memory to be created
trace = client.create_trace_and_wait(
    task="...",
    trajectory=[...],
    review_result="pass",
    poll_interval=0.25,    # seconds between polls
    wait_timeout=60.0,     # max seconds to wait
)
# trace.review_status == "reviewed"
# trace.created_memory_id is set
```

## Reviews

### Inline reviews

Include the review when creating the trace. This is the simplest path - one call does everything.

```python theme={null}
# Context manager
with client.trace("...") as ctx:
    response = my_agent(ctx.augmented_task)
    ctx.set_output(trajectory=messages, result="pass")

# Decorator
@reflect_trace(client, task=lambda q: q)
def answer(ctx, question):
    ...
    return TraceResult(output=response, trajectory=messages, result="pass")

# Explicit
client.create_trace(task="...", trajectory=[...], review_result="pass")
```

The SDK accepts `"success"` / `"failure"` as aliases for `"pass"` / `"fail"`.

### Deferred reviews

Create the trace without a review, then submit one later. This is useful when:

* A human needs to evaluate the answer
* You're running a batch and want to review traces in one go afterwards
* The review depends on external feedback that isn't available yet

```python theme={null}
# Step 1: Create trace without a review
with client.trace("Summarize the quarterly report") as ctx:
    response = my_agent(ctx.augmented_task)
    ctx.set_output(trajectory=messages)  # no result - review deferred

# ctx.trace_id is available after the with block exits
print(f"Trace submitted: {ctx.trace_id}")

# Step 2: Later, after human evaluation
trace = client.review_trace(
    trace_id=ctx.trace_id,
    result="fail",
    feedback_text="Summary missed the revenue decline in Q3",
)
# trace.review_status == "reviewed"
# trace.created_memory_id is now set
```

Deferred reviews are processed **synchronously** - the returned `Trace` includes the review and the created memory ID. You can also review traces from the Reflect Console.

### What makes good feedback

When a trace fails, `feedback_text` is included in the reflection prompt. Specific feedback produces better reflections:

| Feedback                                                                             | Quality                                                                                            |
| ------------------------------------------------------------------------------------ | -------------------------------------------------------------------------------------------------- |
| `"Wrong"`                                                                            | Too vague - the reflection won't capture the specific mistake                                      |
| `"The answer was incorrect"`                                                         | Slightly better but still generic                                                                  |
| `"Missed the WHERE clause in the SQL query, returning all rows instead of filtered"` | Specific - the reflection will mention the WHERE clause, making it useful for future similar tasks |

For passing traces, feedback is optional. The trajectory itself provides enough context for the reflection.

## Listing and fetching traces

```python theme={null}
# List traces by review status
pending = client.list_traces(review_status="pending")
reviewed = client.list_traces(review_status="reviewed")
all_traces = client.list_traces()

# Fetch a specific trace
trace = client.get_trace(trace_id="abc-123")
```

Each `Trace` object includes:

| Field               | Description                                              |
| ------------------- | -------------------------------------------------------- |
| `id`                | Unique trace identifier                                  |
| `task`              | The task that was executed                               |
| `trajectory`        | List of message dicts                                    |
| `review_status`     | `"pending"` or `"reviewed"`                              |
| `ingest_status`     | `"queued"`, `"processing"`, `"completed"`, or `"failed"` |
| `created_memory_id` | ID of the reflection memory (set after review)           |
| `review`            | Attached `Review` object (if reviewed)                   |

## Choosing a pattern

<Card horizontal title="Context manager" icon="brackets-curly">
  **`client.trace()`** — Auto-tracks `retrieved_memory_ids`. Flexible - inspect output before deciding the review. Supports blocking mode for eval loops. Best for multi-step agents, streaming, conditional review logic.
</Card>

<Card horizontal title="Decorator" icon="at">
  **`@reflect_trace`** — Least boilerplate. Wraps a single function. Supports sync and async. Return `TraceResult` for full control or a string for quick prototyping. Best for single-function agents and clean codebases.
</Card>

<Card horizontal title="Explicit calls" icon="terminal">
  **`create_trace` / `create_trace_and_wait`** — Full control over every step. Must pass `retrieved_memory_ids` manually. Best for existing codebases, batch pipelines, and cases where traces are created separately from memory retrieval.
</Card>
