> ## Documentation Index
> Fetch the complete documentation index at: https://docs.starlight-search.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Memories

> What memories are, how they're created, how to retrieve them, and best practices for getting the most out of memory-augmented agents.

## Overview

A **memory** is a concise reflection distilled from a past agent run. It captures what the agent did, what worked, what went wrong, and what to do differently next time - then stores that knowledge so future runs can benefit from it. A reflection is either generated by Reflect from a reviewed trace, or authored by the agent itself (see [How memories are created](#how-memories-are-created)).

Memories exist because LLMs are stateless. Each call starts from scratch with no awareness of what happened last time. Reflect solves this by maintaining a project-level memory bank that accumulates experience across runs, users, and sessions. Before each task, your agent queries this bank and receives the most relevant past reflections ranked by both **semantic similarity** (is this about the same kind of problem?) and **utility** (did this advice actually lead to good outcomes?).

This dual ranking is the core design choice behind Reflect's memory system. Pure semantic search returns relevant results, but it can't distinguish between a reflection that led to a correct answer and one that didn't. utility scores add a learned quality signal that improves over time as more traces are reviewed.

### How memories are created

Every memory is tied to an outcome (`"pass"` or `"fail"`) so its quality can be tracked over time. There are two ways to create one:

**From a reviewed trace — Reflect writes the reflection.** Your agent completes a task and you submit the trace with a review. An LLM reads the trace (task, trajectory, outcome, feedback) and generates the reflection for you. Use this when you have the full trajectory and want Reflect to distill the lesson.

**From an agent-authored reflection — you write the reflection.** The agent that ran the task writes the lesson itself - a `summary` and `guidance` - and submits it with a pass/fail result. Reflect stores it directly, with no trajectory and no background model. This is what the MCP [`create_memory`](/guides/mcp#create-memory) tool and the SDK's `create_memory` method do; it's ideal when the agent already knows the lesson and you don't want to ship a full trajectory.

In both cases the task is embedded and the reflection is stored with an initial `q_value` of `0.5`, then retrieved on future runs when the query is semantically similar and the utility score is high enough. Either way, every memory carries an outcome and its utility is tracked over time.

### How utility scores evolve

Utility scores are updated every time a memory is **retrieved and the run that used it is reviewed**.

A memory starts at `q_value = 0.5`. If it's retrieved in a run that passes, its utility nudges upward. If retrieved in a run that fails, it nudges downward. Over many reviews, useful memories converge toward 1.0 and unhelpful ones toward 0.0.

<Note>
  utility scores only update for memories that were **retrieved and used** in a run. If a memory exists but wasn't retrieved for a particular trace, its utility is unaffected by that trace's review.
</Note>

## Retrieving memories

### `query_memories` - raw retrieval

Returns a ranked list of `Memory` objects without modifying the task text. Use this when you want full control over how memories are injected into your prompt.

```python theme={null}
memories = client.query_memories(
    task="How do I handle rate limits in an API client?",
    limit=10,
    lambda_=0.5,
)

for m in memories:
    print(f"[q={m.q_value:.2f}] {m.task}")
    print(f"  {m.reflection[:100]}...")
```

### `augment_with_memories` - retrieval + formatting

Queries memories and appends them to the task as a structured text block. This is the most common method - it returns a ready-to-use prompt that you pass directly to the LLM.

```python theme={null}
augmented = client.augment_with_memories(
    task="Implement exponential backoff for retries",
    limit=5,
    lambda_=0.6,
)

# Pass this to your LLM - it includes the task + relevant memories
prompt_for_llm = augmented.augmented_task

# The Memory objects are also available if you need their IDs later
retrieved = augmented.memories
```

The formatted output groups memories into three sections based on their review outcomes:

```text theme={null}
Implement exponential backoff for retries

Relevant memories:

Successful memories:

--- Memory 1 ---
Past task:
Handle transient API failures

Reflection:
Use a base delay with exponential increase and random jitter...

Failed memories:

--- Memory 2 ---
Past task:
Retry failed HTTP requests

Reflection:
Fixed delays without jitter caused thundering herd issues...
```

If no memories are found, `augmented_task` returns the original task unchanged - so you can always use it safely without checking.

### Parameters

| Parameter              | Type            | Default  | Description                                                                                                                                                                                                                         |
| ---------------------- | --------------- | -------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `task`                 | `str`           | required | The task text to search against. The API embeds this and finds semantically similar memories.                                                                                                                                       |
| `limit`                | `int`           | `10`     | Maximum number of memories to return.                                                                                                                                                                                               |
| `lambda_`              | `float`         | `0.5`    | Blend weight between similarity and utility (see below).                                                                                                                                                                            |
| `mmr_lambda`           | `float`         | `0.7`    | Maximal Marginal Relevance weight for diversity-aware selection. `1.0` disables MMR (pure utility ranking); `0.0` is pure diversity. See [Diversity-aware retrieval with `mmr_lambda`](#diversity-aware-retrieval-with-mmr_lambda). |
| `metadata_filter`      | `dict \| None`  | `None`   | Optional metadata key/value pairs that memories must match. See [Filtering by metadata](#filtering-by-metadata).                                                                                                                    |
| `similarity_threshold` | `float \| None` | `None`   | Minimum cosine similarity a candidate must reach. Overrides the server default. See [Tuning the similarity threshold](#tuning-the-similarity-threshold).                                                                            |

### The `Memory` object

| Field              | Type           | Description                                                                   |
| ------------------ | -------------- | ----------------------------------------------------------------------------- |
| `id`               | `str`          | Unique identifier - pass these as `retrieved_memory_ids` when creating traces |
| `task`             | `str`          | The past task this memory was generated from                                  |
| `reflection`       | `str`          | LLM-generated reflection text                                                 |
| `q_value`          | `float`        | Learned quality score (0-1, higher = better track record)                     |
| `similarity`       | `float`        | Cosine similarity to the query task                                           |
| `score`            | `float`        | Final ranking score: `(1 - lambda_) * similarity + lambda_ * q_value`         |
| `success`          | `bool \| None` | Whether the source trace passed review (`None` if unreviewed)                 |
| `summary`          | `str`          | One-sentence description of the past task                                     |
| `key_mistake`      | `str`          | Specific wrong action or omission (empty for successful memories)             |
| `correct_action`   | `str`          | Specific right action — tool name + argument patterns                         |
| `applicable_tools` | `list[str]`    | Tools this lesson is about (LLM-chosen)                                       |
| `guidance`         | `str`          | One-paragraph general strategy                                                |
| `tools_used`       | `list[str]`    | Tools that actually appeared in the trajectory (server-extracted)             |

<Note>
  The fields `summary`, `key_mistake`, `correct_action`, `applicable_tools`, `guidance`, and `tools_used` are populated for memories reviewed after the structured-reflection upgrade. Older memories will have these as empty strings / empty lists but retain their original `reflection` text.
</Note>

## Tuning retrieval with `lambda_`

The `lambda_` parameter controls the balance between semantic relevance and learned quality when ranking memories:

```
score = (1 - lambda_) * similarity + lambda_ * q_value
```

| Value     | Behavior                 | When to use                                                                                        |
| --------- | ------------------------ | -------------------------------------------------------------------------------------------------- |
| `0.0`     | Pure semantic similarity | Early in a project when you have few reviewed traces and utility scores haven't differentiated yet |
| `0.5`     | Equal weight (default)   | General-purpose starting point - works well for most projects                                      |
| `0.7–0.9` | Favor utility            | Mature projects with many reviewed traces - surface memories with the best track records           |
| `1.0`     | Pure utility             | Only retrieve the historically most successful memories, regardless of semantic match              |

### When to adjust

* **Increase `lambda_`** if your agent keeps retrieving relevant-sounding memories that lead to bad outcomes. The memories are topically similar but not actually helpful - utility scores will down-rank them.
* **Decrease `lambda_`** if your agent needs broader context from different past tasks. Strict utility ranking can narrow retrieval too much, especially when the most successful memories are about a different subtopic.
* **Keep `0.5`** if you're unsure. The default works well until you have enough reviewed traces to notice a pattern.

## Diversity-aware retrieval with `mmr_lambda`

`lambda_` decides which memories are worth showing. `mmr_lambda` decides whether the top-k you return are *redundant with each other*.

Without diversity re-ranking, top-k can return five paraphrases of the same lesson — useful exactly once, then wasted prompt tokens. Reflect re-ranks the top candidates using **Maximal Marginal Relevance (MMR)**, which iteratively picks memories that maximize:

```
mmr_lambda * blended_score - (1 - mmr_lambda) * max_similarity_to_already_selected
```

Each pick penalizes new candidates that look too much like memories already chosen, so the final set covers more angles for the same prompt budget.

| Value | Behavior                                                     | When to use                                                            |
| ----- | ------------------------------------------------------------ | ---------------------------------------------------------------------- |
| `1.0` | MMR disabled — pure utility ranking                          | Reproduce pre-MMR behavior, or when k is very small (1–2)              |
| `0.7` | Default — relevance-leaning with mild redundancy suppression | General use                                                            |
| `0.5` | Balanced relevance and diversity                             | Banks with many near-duplicate reflections                             |
| `0.0` | Pure diversity                                               | Almost never useful in production — diversity without quality is noise |

MMR runs **after** the `lambda_` blend and the similarity threshold, so it only re-orders candidates that already passed quality filtering. If your bank is small or your queries return naturally diverse results, MMR is a no-op. The cost is negligible — pairwise cosines on at most `limit × 5` candidates per query.

## Filtering by metadata

Any key/value pairs you pass on `metadata` when creating a trace are stored on the resulting memory and become filterable at retrieval time.

```python theme={null}
# Tag at write time — metadata flows onto the memory created from this trace.
client.create_trace(
    task="Cancel reservation via travel insurance",
    trajectory=messages,
    review_result="pass",
    metadata={"action_type": "cancel", "domain": "airline"},
)

# Filter at read time — only memories whose payload matches ALL keys are returned.
memories = client.query_memories(
    task="Customer wants to cancel a flight",
    metadata_filter={"action_type": "cancel", "domain": "airline"},
)
```

The filter is ANDed with the internal `project_id` / `user_id` / `status` filter, so callers cannot reach across projects. Metadata keys can take any JSON-serializable value. Filter values are scalar; when the stored field is a **list**, Qdrant matches if the scalar value is a member of the list — useful for tagging a single memory with multiple categories:

```python theme={null}
client.create_trace(
    ...,
    metadata={"action_types": ["cancel", "modify"]},  # list at write time
)

# Scalar filter value finds memories whose list contains it.
client.query_memories(..., metadata_filter={"action_types": "modify"})
```

<Note>
  Memories created **before** you start passing metadata won't have any fields to match against. A `metadata_filter` that works on new memories will exclude older ones.
</Note>

## Tuning the similarity threshold

Every retrieved candidate must clear a minimum cosine-similarity floor before it can be re-ranked and returned. The server's default is set in `config.toml` (`[memory].similarity_threshold`, typically `0.5`). Clients can override per-call:

```python theme={null}
memories = client.query_memories(
    task="How do I implement retries with jitter?",
    similarity_threshold=0.3,   # looser than default
)
```

| Value           | Behavior                                                                                                                    |
| --------------- | --------------------------------------------------------------------------------------------------------------------------- |
| `0.0`           | Disable the floor — every candidate is considered. Useful when bootstrapping a small memory bank where all cosines are low. |
| `0.3–0.4`       | Permissive. Lets weak but possibly-relevant memories through. Good for heterogeneous domains where embeddings collide.      |
| `0.5` (default) | Moderate. Filters obviously-unrelated memories.                                                                             |
| `0.7+`          | Strict. Only near-duplicate retrievals pass. Use when the memory bank is large and dense.                                   |

Pair this with `metadata_filter` when the bank spans multiple task types: the metadata filter does coarse partitioning (same category only), and the similarity threshold does fine filtering within that partition.

## Tuning Q-value learning rate with `alpha`

Each time a memory is retrieved and its source trace is reviewed, Reflect updates the memory's `q_value` via a Bellman-style step:

```
q_new = q_old + alpha * (reward - q_old)
```

`reward` is `1.0` for `"pass"`, `0.0` for `"fail"`. `alpha ∈ [0, 1]` controls how aggressively the Q-value tracks each new review. The server default is `0.3` (matches the [MemRL paper's](https://arxiv.org/abs/2601.03192) configs), but you can override per-review through the SDK:

```python theme={null}
client.create_trace(
    task=...,
    trajectory=...,
    review_result="pass",
    alpha=0.5,   # this review weighs heavier than usual
)
```

| `alpha`         | Behavior                                                            | When to use                                                                                                   |
| --------------- | ------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------- |
| `0.05–0.1`      | Slow, smooth Q-value updates                                        | High-volume projects with consistent task distributions; you want stable rankings and many reviews per memory |
| `0.3` (default) | Balanced — Q-values differentiate within \~10–20 reviews per memory | General starting point; matches MemRL's published value across all four of their benchmarks                   |
| `0.5–0.7`       | Aggressive — Q-values shift sharply per review                      | Small memory banks where you have few reviews per memory and want them to count for more                      |
| `1.0`           | Pure overwrite — `q_new = reward`                                   | Rare; effectively disables historical averaging                                                               |

### When to adjust per-review

Most callers should leave `alpha` unset and let the server default apply. Per-review override is useful when:

* **Authoritative reviews vs noisy ones.** Pass `alpha=0.5` for reviews from a trusted human expert and `alpha=0.1` for reviews from a less-reliable source like an LLM judge.
* **Bootstrapping a new project.** First few hundred reviews can use `alpha=0.5+` to differentiate memory quality fast, then drop to `0.3` once the bank has matured.
* **Penalizing pivotal failures.** A review with strong evidence the memory caused harm can use `alpha=0.5+` to drop its Q-value sharply.

The server-side default lives in `config.toml` at `[q_learning].alpha` and can also be overridden per-deployment via the `Q_LEARNING_ALPHA` environment variable.

## Best practices

<AccordionGroup>
  <Accordion title="Always pass retrieved_memory_ids when creating traces">
    This is what connects the learning loop. When you create a trace, pass the IDs of the memories that were retrieved for that run. Without them, Reflect can't update utility scores when the trace is reviewed - the memory ranking won't improve.

    The context manager (`client.trace()`) and decorator (`@reflect_trace`) handle this automatically. If you use `create_trace` directly, you must pass them yourself.

    ```python theme={null}
    trace = client.create_trace(
        task="...",
        trajectory=[...],
        retrieved_memory_ids=[m.id for m in augmented.memories],  # don't forget this
        review_result="pass",
    )
    ```
  </Accordion>

  <Accordion title="Write descriptive task strings">
    The task string is what Reflect embeds and matches against when retrieving memories. Vague tasks like `"do the thing"` will match poorly. Descriptive tasks like `"Parse the uploaded CSV, validate column types, and return the first 5 rows"` will retrieve more relevant memories.

    The task is also included in the reflection prompt - a clear task helps the LLM generate better reflections.
  </Accordion>

  <Accordion title="Start with a small limit and increase as needed">
    More memories means more context in the prompt, which costs tokens and can dilute the signal. Start with `limit=3` to `limit=5` and increase if the agent seems to be missing relevant context.
  </Accordion>

  <Accordion title="Review traces to make memories useful">
    Memories start with `q_value=0.5` and only differentiate through reviews. An unreviewed project has flat utility scores - every memory is ranked equally. Reviews are what make the system learn. Even a few dozen reviews can significantly improve retrieval quality.
  </Accordion>

  <Accordion title="Don't filter by success in your prompt - Reflect does it for you">
    `augment_with_memories` already groups memories into "Successful", "Failed", and "Other" sections. The LLM sees the distinction naturally. You don't need to filter out failed memories - they contain valuable "what not to do" context.
  </Accordion>
</AccordionGroup>
