Skip to main content
Memories are reflections generated from reviewed traces. The SDK retrieves them ranked by a blend of semantic similarity and Q-value, so proven patterns surface first.

Query memories

memories = client.query_memories(
    task="How do I handle rate limits in an API client?",
    limit=10,
    lambda_=0.5,
)
ParameterTypeDefaultDescription
taskstrrequiredTask text to search against. The API embeds this and finds similar memories.
limitint10Maximum memories to return.
lambda_float0.5Blend weight: 0.0 = pure similarity, 1.0 = pure Q-value.
Each returned Memory object has:
FieldTypeDescription
idstrMemory identifier
taskstrThe past task this memory was generated from
reflectionstrLLM-generated reflection text
q_valuefloatLearned Q-value (0–1, higher = better track record)
similarityfloatCosine similarity to the query
scorefloatCombined score: (1 - λ) × similarity + λ × q_value
successbool | NoneWhether the source trace passed review

Augment with memories

augment_with_memories queries memories and formats them into a text block you can prepend or append to your LLM prompt:
augmented = client.augment_with_memories(
    task="Implement exponential backoff for retries",
    limit=5,
    lambda_=0.6,
)

prompt_for_llm = augmented.augmented_task
retrieved = augmented.memories
The augmented_task string contains the original task followed by formatted memory blocks grouped into three sections:
  1. Successful memories — reflections from traces that passed review
  2. Failed memories — reflections from traces that failed review
  3. Other relevant memories — reflections where success is unknown
Each block includes the past task and reflection text. If no memories are found, augmented_task returns the original task unchanged.

Format

The output looks like:
Implement exponential backoff for retries

Relevant memories:

Successful memories:

--- Memory 1 ---
Past task:
Handle transient API failures

Reflection:
Use a base delay with exponential increase and random jitter...

Failed memories:

--- Memory 2 ---
Past task:
Retry failed HTTP requests

Reflection:
Fixed delays without jitter caused thundering herd issues...

Tuning retrieval

The lambda_ parameter controls the similarity vs. Q-value balance:
ValueBehavior
0.0Pure semantic similarity — retrieves the most textually relevant memories regardless of outcome
0.5Equal weight (default) — balances relevance with track record
1.0Pure Q-value — retrieves memories with the best success history
Start with the default 0.5. Increase lambda_ if your agent keeps retrieving relevant-but-unhelpful memories. Decrease it if the agent needs broader context from different past tasks.