# Data types Source: https://docs.starlight-search.com/api-reference/data-types Reference for all data types returned by ReflectClient methods. This reference is auto-generated from SDK docstrings. Run `python scripts/generate_api_docs.py` to regenerate. Types are represented as Python models (Pydantic for API contracts, plus SDK dataclasses where applicable). ## MemoryResponse Returned by `query_memories` and contained in `AugmentedTask.memories`. ```python theme={null} class ... (BaseModel) class MemoryResponse: id: str task: str reflection: str q_value: float = 0.5 similarity: float = 0.0 score: float = 0.0 success: bool | None = None summary: str = '' key_mistake: str = '' correct_action: str = '' applicable_tools: list[str] = field(default_factory=list) guidance: str = '' tools_used: list[str] = field(default_factory=list) ``` | Field | Type | Description | | | ------------------ | ----------- | ----------- | - | | `id` | `str` | | | | `task` | `str` | | | | `reflection` | `str` | | | | `q_value` | `float` | | | | `similarity` | `float` | | | | `score` | `float` | | | | `success` | \`bool | None\` | | | `summary` | `str` | | | | `key_mistake` | `str` | | | | `correct_action` | `str` | | | | `applicable_tools` | `list[str]` | | | | `guidance` | `str` | | | | `tools_used` | `list[str]` | | | ## AugmentedTask Returned by `augment_with_memories`. ```python theme={null} @dataclass(slots=True) class AugmentedTask: augmented_task: str memories: list[MemoryResponse] ``` | Field | Type | Description | | ---------------- | ---------------------- | -------------------------------------------------------- | | `augmented_task` | `str` | The original task with formatted memory blocks appended. | | `memories` | `list[MemoryResponse]` | List of MemoryResponse objects that were used. | ## TraceCreateResponse Returned by `create_trace` and `create_trace_async`. ```python theme={null} class ... (BaseModel) class TraceCreateResponse: trace_id: str ingest_status: Literal[queued, completed] ``` | Field | Type | Description | | --------------- | ---------------------------- | ----------- | | `trace_id` | `str` | | | `ingest_status` | `Literal[queued, completed]` | | ## TraceResponse Returned by `get_trace`, `list_traces`, `review_trace`, `wait_for_trace`, and `create_trace_and_wait`. ```python theme={null} class ... (BaseModel) class TraceResponse: id: str task: str trajectory: list[TrajectoryMessage] final_response: str retrieved_memory_ids: list[str] model: str | None = None metadata: dict[str, object] = field(default_factory=dict) review_status: Literal[pending, reviewed] ingest_status: Literal[queued, processing, completed, failed] ingest_attempts: int = 0 last_ingest_error: str | None = None created_memory_id: str | None = None review: reflect_contracts.models.ReviewResponse | None = None ``` | Field | Type | Description | | | ---------------------- | ------------------------------------------------ | ----------- | - | | `id` | `str` | | | | `task` | `str` | | | | `trajectory` | `list[TrajectoryMessage]` | | | | `final_response` | `str` | | | | `retrieved_memory_ids` | `list[str]` | | | | `model` | \`str | None\` | | | `metadata` | `dict[str, object]` | | | | `review_status` | `Literal[pending, reviewed]` | | | | `ingest_status` | `Literal[queued, processing, completed, failed]` | | | | `ingest_attempts` | `int` | | | | `last_ingest_error` | \`str | None\` | | | `created_memory_id` | \`str | None\` | | | `review` | \`reflect\_contracts.models.ReviewResponse | None\` | | ## ReviewResponse Nested in `TraceResponse.review`. ```python theme={null} class ... (BaseModel) class ReviewResponse: id: str trace_id: str result: Literal[pass, fail] feedback_text: str | None = None mode: Literal[inline, deferred] ``` | Field | Type | Description | | | --------------- | --------------------------- | ----------- | - | | `id` | `str` | | | | `trace_id` | `str` | | | | `result` | `Literal[pass, fail]` | | | | `feedback_text` | \`str | None\` | | | `mode` | `Literal[inline, deferred]` | | | ## ApiKeyResponse Returned by `list_api_keys`, `revoke_api_key`, and nested in `ApiKeyCreateResponse.key`. ```python theme={null} class ... (BaseModel) class ApiKeyResponse: id: str public_id: str label: str scopes: list[str] is_master: bool = False environment: Literal[live, test] status: Literal[active, revoked] created_at: str | None = None last_used_at: str | None = None revoked_at: str | None = None ``` | Field | Type | Description | | | -------------- | -------------------------- | ----------- | - | | `id` | `str` | | | | `public_id` | `str` | | | | `label` | `str` | | | | `scopes` | `list[str]` | | | | `is_master` | `bool` | | | | `environment` | `Literal[live, test]` | | | | `status` | `Literal[active, revoked]` | | | | `created_at` | \`str | None\` | | | `last_used_at` | \`str | None\` | | | `revoked_at` | \`str | None\` | | ## ApiKeyCreateResponse Returned by `create_api_key`. ```python theme={null} class ... (BaseModel) class ApiKeyCreateResponse: api_key: str key: ApiKeyResponse ``` | Field | Type | Description | | --------- | ---------------- | ----------- | | `api_key` | `str` | | | `key` | `ApiKeyResponse` | | ## BootstrapResponse Returned by `ReflectClient.bootstrap`. ```python theme={null} class ... (BaseModel) class BootstrapResponse: user_id: str project_id: str api_key: str ``` | Field | Type | Description | | ------------ | ----- | ----------- | | `user_id` | `str` | | | `project_id` | `str` | | | `api_key` | `str` | | # Reference overview Source: https://docs.starlight-search.com/api-reference/introduction SDK exports and how the reference is organized. The Reflect SDK exports one client class and seven data types from `reflect_sdk`: ```python theme={null} from reflect_sdk import ( ReflectClient, Memory, AugmentedTask, Trace, TraceSubmission, Review, ApiKeyInfo, CreatedApiKey, BootstrapInfo, ) ``` ## Client | Class | Description | | ------------------------------------------------ | ------------------------------------------------------------------------------------ | | [`ReflectClient`](/api-reference/reflect-client) | Authenticated client for a single project. All SDK operations go through this class. | ## Data types All data types are dataclasses returned by `ReflectClient` methods. See the [data types reference](/api-reference/data-types) for field-level documentation. | Type | Returned by | | ----------------- | ------------------------------------------------------------------------------------- | | `Memory` | `query_memories`, `augment_with_memories` | | `AugmentedTask` | `augment_with_memories` | | `TraceSubmission` | `create_trace`, `create_trace_async` | | `Trace` | `get_trace`, `list_traces`, `review_trace`, `wait_for_trace`, `create_trace_and_wait` | | `Review` | Nested in `Trace.review` | | `ApiKeyInfo` | `list_api_keys`, `revoke_api_key` | | `CreatedApiKey` | `create_api_key` | | `BootstrapInfo` | `ReflectClient.bootstrap` | # ReflectClient Source: https://docs.starlight-search.com/api-reference/reflect-client Complete method reference for the Reflect Python client. This reference is auto-generated from SDK docstrings. Run `python scripts/generate_api_docs.py` to regenerate. ## Constructor ```python theme={null} def __init__( *, base_url: str = "https://api.starlight-search.com", api_key: str, project_id: str | None = None, timeout: float | httpx.Timeout = 60.0, ) -> None ``` Create a Reflect client for a project. | Parameter | Type | Default | Description | | | ------------ | ------- | ------------------------------------ | --------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------ | | `base_url` | `str` | `"https://api.starlight-search.com"` | Reflect API base URL (e.g. [http://localhost:8000](http://localhost:8000)). | | | `api_key` | `str` | required | Plaintext API key (e.g. rf\_live\_...). | | | `project_id` | \`str | None\` | `None` | Project identifier. If the project does not exist yet it will be created automatically (master keys only). Defaults to `"default"` when omitted. | | `timeout` | \`float | httpx.Timeout\` | `60.0` | Request timeout in seconds. | ```python theme={null} from reflect_sdk import ReflectClient client = ReflectClient( base_url="http://localhost:8000", api_key="rf_live_abc123_secret456", project_id="my-project", ) ``` *** ## Class methods ### bootstrap ```python theme={null} @classmethod def bootstrap( *, base_url: str, bootstrap_token: str, user_id: str, project_id: str, key_label: str = "Default Admin Key", environment: str = "live", timeout: float | httpx.Timeout = 60.0, ) -> BootstrapResponse ``` ```python theme={null} info = ReflectClient.bootstrap( base_url="http://localhost:8000", bootstrap_token="your-admin-token", user_id="user-1", project_id="new-project", ) # info.api_key contains the plaintext key ``` *** ## Instance methods ### health ```python theme={null} def health( ) -> dict[str, str] ``` Return the API health status. No authentication required. *** ### query\_memories\_async ```python theme={null} async def query_memories_async( *, task: str, limit: int = 10, lambda_: float = 0.5, metadata_filter: dict[str, Any] | None = None, similarity_threshold: float | None = None, ) -> list[MemoryResponse] ``` Async version of `query_memories`. | Parameter | Type | Default | Description | | | ---------------------- | ----------------- | -------- | ----------- | - | | `task` | `str` | required | | | | `limit` | `int` | `10` | | | | `lambda_` | `float` | `0.5` | | | | `metadata_filter` | \`dict\[str, Any] | None\` | `None` | | | `similarity_threshold` | \`float | None\` | `None` | | *** ### augment\_with\_memories\_async ```python theme={null} async def augment_with_memories_async( *, task: str, limit: int = 10, lambda_: float = 0.5, metadata_filter: dict[str, Any] | None = None, similarity_threshold: float | None = None, ) -> AugmentedTask ``` Async version of `augment_with_memories`. | Parameter | Type | Default | Description | | | ---------------------- | ----------------- | -------- | ----------- | - | | `task` | `str` | required | | | | `limit` | `int` | `10` | | | | `lambda_` | `float` | `0.5` | | | | `metadata_filter` | \`dict\[str, Any] | None\` | `None` | | | `similarity_threshold` | \`float | None\` | `None` | | *** ### query\_memories ```python theme={null} def query_memories( *, task: str, limit: int = 10, lambda_: float = 0.5, metadata_filter: dict[str, Any] | None = None, similarity_threshold: float | None = None, ) -> list[MemoryResponse] ``` Retrieve memories by semantic similarity and Q-value ranking. | Parameter | Type | Default | Description | | | ---------------------- | ----------------- | -------- | -------------------------------------------------------------- | ---------------------------------------------------------------------------------------------------- | | `task` | `str` | required | Task description to search against. | | | `limit` | `int` | `10` | Maximum number of memories to return. | | | `lambda_` | `float` | `0.5` | Blend between similarity (1.0) and Q-value (0.0). Default 0.5. | | | `metadata_filter` | \`dict\[str, Any] | None\` | `None` | Optional additional metadata key/value pairs that memories must match (ANDed with internal filters). | | `similarity_threshold` | \`float | None\` | `None` | Optional minimum cosine similarity override. When omitted, the server's configured default is used. | **Returns:** List of MemoryResponse objects, ranked by blended score. *** ### get\_skill ```python theme={null} def get_skill( ) -> SkillResponse | None ``` Return the project skill, or None if no skill has been created yet. *** ### get\_skill\_async ```python theme={null} async def get_skill_async( ) -> SkillResponse | None ``` Async variant of `get_skill`. *** ### create\_skill ```python theme={null} def create_skill( *, n_passed: int | None = None, n_failed: int | None = None, ) -> SkillResponse ``` | Parameter | Type | Default | Description | | | ---------- | ----- | ------- | ----------- | - | | `n_passed` | \`int | None\` | `None` | | | `n_failed` | \`int | None\` | `None` | | *** ### augment\_with\_memories ```python theme={null} def augment_with_memories( *, task: str, limit: int = 10, lambda_: float = 0.5, metadata_filter: dict[str, Any] | None = None, similarity_threshold: float | None = None, ) -> AugmentedTask ``` Query memories and format them into the task text for prompt augmentation. | Parameter | Type | Default | Description | | | ---------------------- | ----------------- | -------- | ------------------------------------- | ---------------------------------------------------------------------------------------------------- | | `task` | `str` | required | Task to augment. | | | `limit` | `int` | `10` | Maximum memories to retrieve. | | | `lambda_` | `float` | `0.5` | Blend between similarity and Q-value. | | | `metadata_filter` | \`dict\[str, Any] | None\` | `None` | Optional additional metadata key/value pairs that memories must match (ANDed with internal filters). | | `similarity_threshold` | \`float | None\` | `None` | Optional minimum cosine similarity override. When omitted, the server's configured default is used. | **Returns:** AugmentedTask with augmented\_task (task + memory blocks) and memories. *** ### trace ```python theme={null} def trace( *, task: str, limit: int = 10, lambda_: float = 0.5, metadata_filter: dict[str, Any] | None = None, similarity_threshold: float | None = None, blocking: bool = False, auto_fail_on_exception: bool = True, reference_context: str | None = None, alpha: float | None = None, ) -> Generator[TraceContext, None, None] ``` Context manager that retrieves memories and auto-submits the trace. On entry, memories are queried and made available via `ctx.augmented_task` and `ctx.memories`. Call `ctx.set_output(...)` inside the block to record your agent's result. On exit, the trace is submitted with the correct `retrieved_memory_ids` automatically. | Parameter | Type | Default | Description | | | ------------------------ | ----------------- | -------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------- | - | | `task` | `str` | required | Task description for memory retrieval and trace logging. | | | `limit` | `int` | `10` | Maximum memories to retrieve. | | | `lambda_` | `float` | `0.5` | Blend between similarity and Q-value. | | | `metadata_filter` | \`dict\[str, Any] | None\` | `None` | | | `similarity_threshold` | \`float | None\` | `None` | | | `blocking` | `bool` | `False` | If `True`, wait for memory creation before returning from the context (uses `create_trace_and_wait`). | | | `auto_fail_on_exception` | `bool` | `True` | If `True` and an unhandled exception occurs after `set_output` was called, the trace is submitted with `result="fail"` and the exception message as `feedback_text`. | | | `reference_context` | \`str | None\` | `None` | | | `alpha` | \`float | None\` | `None` | | *** ### trace\_async ```python theme={null} def trace_async( *, task: str, limit: int = 10, lambda_: float = 0.5, metadata_filter: dict[str, Any] | None = None, similarity_threshold: float | None = None, blocking: bool = False, auto_fail_on_exception: bool = True, reference_context: str | None = None, alpha: float | None = None, ) -> AsyncGenerator[TraceContext, None] ``` Async version of `trace`. | Parameter | Type | Default | Description | | | ------------------------ | ----------------- | -------- | ----------- | - | | `task` | `str` | required | | | | `limit` | `int` | `10` | | | | `lambda_` | `float` | `0.5` | | | | `metadata_filter` | \`dict\[str, Any] | None\` | `None` | | | `similarity_threshold` | \`float | None\` | `None` | | | `blocking` | `bool` | `False` | | | | `auto_fail_on_exception` | `bool` | `True` | | | | `reference_context` | \`str | None\` | `None` | | | `alpha` | \`float | None\` | `None` | | *** ### create\_trace ```python theme={null} def create_trace( *, task: str, trajectory: TrajectoryInput, final_response: str | None = None, retrieved_memory_ids: Sequence[str] = (), model: str | None = None, metadata: dict[str, Any] | None = None, review_result: str | None = None, feedback_text: str | None = None, reference_context: str | None = None, alpha: float | None = None, ) -> TraceCreateResponse ``` Record your agent's run without blocking your application. Call this after your agent finishes a task to send the full conversation to Reflect for storage. The call returns immediately with a `TraceCreateResponse` — the trace is ingested in the background, so your agent can move on to the next request without waiting. If you already know whether the response was correct (e.g. you compared it to an expected answer), pass `review_result` to include an **inline review**. Reflect will then generate a reflection and store it as a new memory in the background, so future runs of your agent can learn from this outcome. If you don't know the result yet, omit `review_result` and review later via `review_trace` or the web dashboard. | Parameter | Type | Default | Description | | | ---------------------- | ----------------- | -------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | `task` | `str` | required | What your agent was asked to do — e.g. the user's question or the job description. Reflect uses this to match memories on future runs, so be descriptive. | | | `trajectory` | `TrajectoryInput` | required | The conversation between the user and your agent. Pass a list of `{"role": ..., "content": ...}` message dicts (the same format most LLM APIs return), or a JSON string that deserializes to such a list. | | | `final_response` | \`str | None\` | `None` | Your agent's final answer. When `None`, Reflect extracts it from the last `"assistant"` message in the trajectory automatically. | | `retrieved_memory_ids` | `Sequence[str]` | `()` | IDs of the memories your agent used during this run (from `query_memories` or `augment_with_memories`). Passing these lets Reflect update their Q-values when a review comes in, reinforcing helpful memories and down-ranking unhelpful ones. | | | `model` | \`str | None\` | `None` | The model your agent used (e.g. `"gpt-5.4-mini"`). Shown in the dashboard for filtering and analysis. | | `metadata` | \`dict\[str, Any] | None\` | `None` | Any extra context you want to attach — e.g. `{"customer_id": "c42", "environment": "staging"}`. Visible in the dashboard and useful for filtering. | | `review_result` | \`str | None\` | `None` | Judge whether the response was correct: `"pass"` or `"success"` if it was, `"fail"` or `"failure"` if not. When provided, Reflect generates a reflection from the conversation and stores it as a memory so your agent improves over time. | | `feedback_text` | \`str | None\` | `None` | When the response failed, explain **what went wrong** — e.g. `"Missed the WHERE clause"` or `"Gave an answer about the wrong product"`. This feedback is included in the generated reflection so your agent learns the specific mistake. Ignored when `review_result` is `None`. | | `reference_context` | \`str | None\` | `None` | | | `alpha` | \`float | None\` | `None` | | **Returns:** A `TraceCreateResponse` with the trace `id` and its `ingest_status` (typically `"queued"`). **Example — log your agent's run for later review** ```python theme={null} # After your agent responds to a user... submission = client.create_trace( task="Summarize this article about climate change", trajectory=[ {"role": "user", "content": "Summarize this article: ..."}, {"role": "assistant", "content": "Here is a summary: ..."}, ], model="gpt-5.4-mini", metadata={"user_id": "u123"}, ) # Returns immediately — your app continues serving requests. # Review this trace later in the dashboard or via review_trace(). ``` **Example — log and review in one call (auto-graded)** ```python theme={null} # Compare the agent's answer to the expected answer... is_correct = agent_answer.strip() == expected_answer.strip() submission = client.create_trace( task=problem_description, trajectory=messages, retrieved_memory_ids=[m.id for m in memories], model="gpt-5.4-mini", review_result="pass" if is_correct else "fail", feedback_text=None if is_correct else f"Expected {expected_answer}", ) # Reflect generates a reflection in the background. # Next time your agent sees a similar task, it can retrieve # this memory to avoid repeating the same mistake. ``` `"success"` and `"failure"` are aliases for the API's `"pass"` and `"fail"`. The SDK maps them automatically. *** ### create\_trace\_async ```python theme={null} async def create_trace_async( *, task: str, trajectory: TrajectoryInput, final_response: str | None = None, retrieved_memory_ids: Sequence[str] = (), model: str | None = None, metadata: dict[str, Any] | None = None, review_result: str | None = None, feedback_text: str | None = None, reference_context: str | None = None, alpha: float | None = None, ) -> TraceCreateResponse ``` Async variant of [`create_trace`](#create_trace). Same parameters and return type. Uses `asyncio.sleep` between polls where applicable. *** ### wait\_for\_trace ```python theme={null} def wait_for_trace( *, trace_id: str, require_reviewed: bool = False, poll_interval: float = 0.25, wait_timeout: float = 60.0, ) -> TraceResponse ``` | Parameter | Type | Default | Description | | ------------------ | ------- | -------- | ----------- | | `trace_id` | `str` | required | | | `require_reviewed` | `bool` | `False` | | | `poll_interval` | `float` | `0.25` | | | `wait_timeout` | `float` | `60.0` | | *** ### wait\_for\_trace\_async ```python theme={null} async def wait_for_trace_async( *, trace_id: str, require_reviewed: bool = False, poll_interval: float = 0.25, wait_timeout: float = 60.0, ) -> TraceResponse ``` Async variant of [`wait_for_trace`](#wait_for_trace). Same parameters and return type. Uses `asyncio.sleep` between polls where applicable. *** ### create\_trace\_and\_wait ```python theme={null} def create_trace_and_wait( *, task: str, trajectory: TrajectoryInput, final_response: str | None = None, retrieved_memory_ids: Sequence[str] = (), model: str | None = None, metadata: dict[str, Any] | None = None, review_result: str | None = None, feedback_text: str | None = None, reference_context: str | None = None, alpha: float | None = None, poll_interval: float = 0.25, wait_timeout: float = 60.0, ) -> TraceResponse ``` Record your agent's run and wait until the memory is created. Use this when your next step depends on the trace (and its memory) being fully processed — for example: * **Evaluation loops** where you run multiple tasks in sequence and need each memory to exist before the next task starts, so your agent can learn from earlier mistakes within the same run. * **Tests** where you want to assert on the created memory or the final review status. * **Scripts and pipelines** where you need confirmation that the reflection was stored before moving on. This method submits the trace, then polls until it's done: * **Without** `review_result`: waits until the trace is stored. * **With** `review_result`: waits until the review is processed, the reflection is generated, and the memory is saved. The returned `TraceResponse` will have `review_status == "reviewed"` and a populated `created_memory_id`. If your application serves real-time traffic and you don't want to block, use `create_trace` instead — it returns immediately while processing happens in the background. | Parameter | Type | Default | Description | | | ---------------------- | ----------------- | -------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | | `task` | `str` | required | What your agent was asked to do. Reflect uses this to match memories on future runs, so be descriptive. | | | `trajectory` | `TrajectoryInput` | required | The conversation between the user and your agent — a list of `{"role": ..., "content": ...}` message dicts, or a JSON string. | | | `final_response` | \`str | None\` | `None` | Your agent's final answer. When `None`, Reflect extracts it from the last assistant message provided in the trajectory. | | `retrieved_memory_ids` | `Sequence[str]` | `()` | IDs of the memories your agent used (from `query_memories` or `augment_with_memories`). Passing these lets Reflect update their Q-values based on the review. | | | `model` | \`str | None\` | `None` | The model your agent used (e.g. `"gpt-5.4-mini"`). Shown in the dashboard for filtering. | | `metadata` | \`dict\[str, Any] | None\` | `None` | Extra context to attach — e.g. `{"source": "eval_pipeline", "run_id": "r42"}`. | | `review_result` | \`str | None\` | `None` | Judge whether the response was correct: `"pass"` / `"success"` or `"fail"` / `"failure"`. When provided, this method waits for the reflection to be generated and stored as a memory before returning. | | `feedback_text` | \`str | None\` | `None` | When the response failed, explain what went wrong so the reflection captures the specific mistake. Ignored when `review_result` is `None`. | | `reference_context` | \`str | None\` | `None` | | | `alpha` | \`float | None\` | `None` | | | `poll_interval` | `float` | `0.25` | How often (in seconds) to check whether processing is done. Default `0.25`. | | | `wait_timeout` | `float` | `60.0` | Maximum seconds to wait. Raise `TimeoutError` if the trace is still processing. Default `60.0`. Increase this if your reflections use a slow model. | | **Returns:** The fully processed `TraceResponse` with the attached `ReviewResponse` and `created_memory_id` (when reviewed). **Raises:** `RuntimeError` — If processing fails — e.g. the LLM errored while generating the reflection. Check `trace.last_ingest_error` for details. `TimeoutError` — If processing doesn't finish within `wait_timeout` seconds. **Example — evaluation loop that learns across tasks** ```python theme={null} for problem in problems: # Retrieve memories from previous tasks in this run augmented = client.augment_with_memories(problem.question) answer = my_agent.solve(augmented.augmented_task) is_correct = answer.strip() == problem.expected.strip() trace = client.create_trace_and_wait( task=problem.question, trajectory=augmented_messages, final_response=answer, retrieved_memory_ids=[m.id for m in augmented.memories], model="gpt-5.4-mini", review_result="pass" if is_correct else "fail", feedback_text=None if is_correct else f"Expected {problem.expected}", ) # MemoryResponse is now stored — the next iteration can retrieve it. ``` **Example — interactive CLI with human review** ```python theme={null} answer = my_agent.solve(task) result = input("Was this correct? [y/n]: ") trace = client.create_trace_and_wait( task=task, trajectory=messages, final_response=answer, retrieved_memory_ids=[m.id for m in memories], review_result="pass" if result == "y" else "fail", feedback_text=input("Feedback: ") if result != "y" else None, ) print(f"MemoryResponse created: {trace.created_memory_id}") ``` `"success"` and `"failure"` are aliases for the API's `"pass"` and `"fail"`. The SDK maps them automatically. *** ### create\_trace\_and\_wait\_async ```python theme={null} async def create_trace_and_wait_async( *, task: str, trajectory: TrajectoryInput, final_response: str | None = None, retrieved_memory_ids: Sequence[str] = (), model: str | None = None, metadata: dict[str, Any] | None = None, review_result: str | None = None, feedback_text: str | None = None, reference_context: str | None = None, alpha: float | None = None, poll_interval: float = 0.25, wait_timeout: float = 60.0, ) -> TraceResponse ``` Async variant of [`create_trace_and_wait`](#create_trace_and_wait). Same parameters and return type. Uses `asyncio.sleep` between polls where applicable. *** ### list\_traces ```python theme={null} def list_traces( *, review_status: str | None = None, ) -> list[TraceResponse] ``` List traces for the project. | Parameter | Type | Default | Description | | | --------------- | ----- | ------- | ----------- | ------------------------------------------------- | | `review_status` | \`str | None\` | `None` | Filter by "pending", "reviewed", or None for all. | **Returns:** List of TraceResponse objects. *** ### get\_trace ```python theme={null} def get_trace( trace_id: str, ) -> TraceResponse ``` Fetch a single trace by ID. | Parameter | Type | Default | Description | | ---------- | ----- | -------- | ----------- | | `trace_id` | `str` | required | | *** ### get\_trace\_async ```python theme={null} async def get_trace_async( trace_id: str, ) -> TraceResponse ``` Async variant of [`get_trace`](#get_trace). Same parameters and return type. Uses `asyncio.sleep` between polls where applicable. *** ### review\_trace ```python theme={null} def review_trace( *, trace_id: str, result: str, feedback_text: str | None = None, alpha: float | None = None, ) -> TraceResponse ``` Submit a deferred review for a trace. | Parameter | Type | Default | Description | | | --------------- | ------- | -------- | ------------------------------------------------------------ | ----------------------------------------------------------------------------------------------------------------------------------- | | `trace_id` | `str` | required | ID of the trace to review. | | | `result` | `str` | required | `"pass"` / `"fail"`, or `"success"` / `"failure"` (aliases). | | | `feedback_text` | \`str | None\` | `None` | Optional human feedback. | | `alpha` | \`float | None\` | `None` | Optional Q-learning step size override for this review. When omitted, the server's configured default is used. Must be in `[0, 1]`. | **Returns:** The updated TraceResponse with the review attached. *** ### delete\_traces ```python theme={null} def delete_traces( trace_ids: Sequence[str], ) -> DeleteTracesResponse ``` | Parameter | Type | Default | Description | | ----------- | --------------- | -------- | ----------- | | `trace_ids` | `Sequence[str]` | required | | *** ### list\_api\_keys ```python theme={null} def list_api_keys( ) -> list[ApiKeyResponse] ``` *** ### create\_api\_key ```python theme={null} def create_api_key( *, label: str, scopes: Sequence[str], environment: str = "live", ) -> ApiKeyCreateResponse ``` | Parameter | Type | Default | Description | | ------------- | --------------- | -------- | ----------- | | `label` | `str` | required | | | `scopes` | `Sequence[str]` | required | | | `environment` | `str` | `"live"` | | *** ### revoke\_api\_key ```python theme={null} def revoke_api_key( key_id: str, ) -> ApiKeyResponse ``` | Parameter | Type | Default | Description | | --------- | ----- | -------- | ----------- | | `key_id` | `str` | required | | *** ### delete\_project ```python theme={null} def delete_project( ) -> DeleteProjectResponse ``` *** # For AI Agents Source: https://docs.starlight-search.com/for-agents How AI agents consume Reflect: plain-text docs, installable skills, and the MCP server. Reflect ships three integration surfaces optimized for AI agents and the IDEs that host them: The entire docs site as plain text. Fetch with `curl` or `urllib`, drop into Cursor `@docs` or any RAG indexer. Agent skills installed via `npx skills`. The `integrate-reflect` skill walks a coding agent through adding Reflect to a project end-to-end. `retrieve_memories` and `create_memory` tools over the Model Context Protocol. Compatible with Cursor, Claude Code, Cline, Continue, Windsurf, Zed, and more. *** ## llms.txt — docs as plain text The docs site auto-generates two plain-text bundles for ingestion by AI agents: | URL | Purpose | Size | | ------------------------------------------------------------------- | ----------------------------------------------- | ------- | | [`/llms.txt`](https://docs.starlight-search.com/llms.txt) | Curated index with markdown links to every page | \~2 KB | | [`/llms-full.txt`](https://docs.starlight-search.com/llms-full.txt) | Full docs site concatenated into one file | \~50 KB | Both serve as `text/plain`. **Fetch with `curl` or `urllib`, not a JS-rendered HTML fetcher.** The HTML console at `reflect.starlight-search.com` is a single-page app and won't yield content to plain HTTP fetchers — the text bundles above are the right entry point for any agent. ```bash theme={null} curl -sS https://docs.starlight-search.com/llms-full.txt -o reflect-docs.txt ``` ### Wiring into IDEs Add the URL via Cursor's docs feature: ``` @docs add reflect https://docs.starlight-search.com/llms-full.txt ``` Then reference it in any chat with `@reflect`. Mention the URL in your project's `CLAUDE.md` or in a chat — Claude Code will fetch it on demand: ```markdown theme={null} Reflect docs as plain text: https://docs.starlight-search.com/llms-full.txt ``` Fetch the file at startup and inject into the agent's context: ```python theme={null} import urllib.request docs = urllib.request.urlopen( "https://docs.starlight-search.com/llms-full.txt" ).read().decode("utf-8") # then add `docs` to your system prompt ``` *** ## Skills — installable workflow guides Reflect publishes agent skills under [`StarlightSearch/reflect-skills`](https://github.com/StarlightSearch/reflect-skills) on GitHub. Skills are installed with the [`skills` CLI](https://skills.sh) and work in Claude Code, Cursor, Codex, Gemini CLI, Antigravity, Deep Agents, Pi, Qwen Code, and other agent hosts. ### `integrate-reflect` Walks a coding agent through adding Reflect to a Python agent project end-to-end: SDK install, framework-specific loop placement, parameter tuning, LLM-as-judge wiring, and a mandatory smoke test that proves the loop closes. Covers OpenAI Agents SDK, Claude Agent SDK, LangGraph, Pydantic AI, and a generic-loop fallback. Handles both fresh projects (scaffolds a starter agent) and existing codebases (overlays Reflect onto existing loops). **Install globally:** ```bash theme={null} npx skills add StarlightSearch/reflect-skills@integrate-reflect -g -y ``` **Triggers automatically when the user says:** * "add Reflect to my agent" * "give my agent memory" * "build an agent with Reflect" * "wire up `client.trace`" * "`ctx.memories` is empty" / "q-values aren't moving" Browse the source and contribute on [GitHub](https://github.com/StarlightSearch/reflect-skills). Reflect *also* has a project-level [Skill API](/guides/skills) that distills your reviewed traces into a unified guide for *your specific agent*. That's a different feature than the installable workflow skills here. The installable skills teach a coding agent how to *integrate* Reflect; the Skill API is what Reflect generates *from* your traces once you're integrated. *** ## MCP server The Reflect MCP server exposes `retrieve_memories` and `create_memory` over the [Model Context Protocol](https://modelcontextprotocol.io). Connect it to any MCP-capable client and your AI assistant will automatically query past lessons before hard tasks and record new ones after each run. **Hosted endpoint:** `https://api.starlight-search.com/mcp` **Auth:** `Authorization: Bearer ` **Transport:** Streamable HTTP Get your API key from the [Reflect console](https://reflect.starlight-search.com). ### Quick connect — JSON-based clients Most MCP-capable IDEs use the same JSON config block. Drop this into the right file for your client: ```json theme={null} { "mcpServers": { "reflect": { "url": "https://api.starlight-search.com/mcp", "headers": { "Authorization": "Bearer rf_live_..." } } } } ``` | Client | Config file | | -------------------------------- | --------------------------------------------------------------------------------- | | **Cursor** | `.cursor/mcp.json` (project) or `~/.cursor/mcp.json` (global) | | **Cline** (VS Code) | `cline_mcp_settings.json` (Cline → Settings → MCP Servers → Configure) | | **Continue** (VS Code/JetBrains) | `~/.continue/config.json` under `experimental.modelContextProtocolServers` | | **Windsurf** | `~/.codeium/windsurf/mcp_config.json` | | **Zed** | `~/.config/zed/settings.json` under `"context_servers"` (no `mcpServers` wrapper) | ### Quick connect — Claude Code ```bash theme={null} claude mcp add --transport http reflect https://api.starlight-search.com/mcp \ --header "Authorization: Bearer rf_live_..." ``` ### Stdio transport (alternative) If your client doesn't support HTTP MCP servers, run the local stdio version with `uvx`: ```json theme={null} { "mcpServers": { "reflect": { "command": "uvx", "args": ["--from", "reflect-mcp-server", "reflect-mcp-server", "--transport", "stdio"], "env": { "REFLECT_API_KEY": "rf_live_...", "REFLECT_PROJECT_ID": "your-project-id", "REFLECT_API_URL": "https://api.starlight-search.com" } } } } ``` For self-hosting, advanced options, and the full env-var reference, see the [MCP server guide](/guides/mcp). *** ## Quick reference | What you want | Where to go | | ------------------------------------- | ----------------------------------------------------------------------------------- | | Read all the docs in one fetch | `https://docs.starlight-search.com/llms-full.txt` | | Browse the docs index | `https://docs.starlight-search.com/llms.txt` | | Add Reflect to a Python agent project | `npx skills add StarlightSearch/reflect-skills@integrate-reflect` | | Connect any MCP client to Reflect | `https://api.starlight-search.com/mcp` (Bearer auth) | | Browse all skills | [StarlightSearch/reflect-skills](https://github.com/StarlightSearch/reflect-skills) | | Get a project + API key | [Reflect console](https://reflect.starlight-search.com) | # Deep Agents Source: https://docs.starlight-search.com/guides/examples/deepagents-exa The context manager pattern: use reflect.trace() as a with-block to instrument multi-step tool-using agents. ## Pattern: Context Manager The context manager pattern (`with reflect.trace(...)`) gives you full control over when the trace is submitted and lets you decide the review result after inspecting the agent's output. It fits naturally into multi-step workflows where a decorator would be too rigid. ``` with reflect.trace(task) → ctx.augmented_task → agent runs tools → ctx.set_output → trace auto-submitted on exit ``` This example pairs [Deep Agents](https://github.com/andysingal/deep_agents) with [Exa](https://exa.ai) for real-time web search and uses Reflect to build memory across research runs. ## Prerequisites ```bash theme={null} export REFLECT_API_KEY=rf_live_... export REFLECT_PROJECT_ID=your-project-id export EXA_API_KEY=... export OPENAI_API_KEY=sk-... # or another provider key ``` Install dependencies: ```bash theme={null} pip install reflect-sdk deepagents exa-py ``` Optional env vars: ```bash theme={null} export REFLECT_API_URL=https://api.starlight-search.com # defaults to localhost:8000 export DEEPAGENT_MODEL=openai:gpt-5.4-mini export RESEARCH_QUERY="Latest breakthroughs in fusion energy" ``` ## Full example ```python deepagents_exa_quickstart.py theme={null} """Deep Agents quickstart using Exa Search + Reflect memory loop. Prerequisites: - pip install deepagents exa-py - Set EXA_API_KEY - Set one model provider API key (for example OPENAI_API_KEY) - Set REFLECT_PROJECT_ID and REFLECT_API_KEY Optional env vars: - REFLECT_API_URL (default: http://localhost:8000) - DEEPAGENT_MODEL (default: openai:gpt-5.4-mini) - RESEARCH_QUERY (default: What is the stock price of Apple and Nvidia?) """ from __future__ import annotations import os from deepagents import create_deep_agent from exa_py import Exa from reflect_sdk import ReflectClient from reflect_sdk.converters import from_deepagents BASE_URL = os.getenv("REFLECT_API_URL", "http://localhost:8000") PROJECT_ID = os.getenv("REFLECT_PROJECT_ID") or "example" REFLECT_API_KEY = os.getenv("REFLECT_API_KEY") MODEL = os.getenv("DEEPAGENT_MODEL", "openai:gpt-5.4-mini") TASK = os.getenv("RESEARCH_QUERY", "What is the stock price of Apple and Nvidia?") MEMORY_LIMIT = 3 MEMORY_LAMBDA = 0.5 API_TIMEOUT = 180.0 research_instructions = """You are an expert researcher. Your job is to conduct thorough research and then write a polished report. You have access to an internet search tool as your primary means of gathering information. ## `internet_search` Use this to run an internet search for a given query. You can specify the max number of results to return and the topic. Prefer high-quality primary sources and cite concrete facts from search results. """ def internet_search(query: str, max_results: int = 5): """Run an internet search using Exa.""" exa = Exa(api_key=os.environ["EXA_API_KEY"]) return exa.search( query, num_results=max_results, contents={"highlights": {"max_characters": 2000}}, ) def main() -> None: if not os.getenv("EXA_API_KEY"): raise RuntimeError("Set EXA_API_KEY.") if not PROJECT_ID or not REFLECT_API_KEY: raise RuntimeError("Set REFLECT_PROJECT_ID and REFLECT_API_KEY.") reflect = ReflectClient( base_url=BASE_URL, api_key=REFLECT_API_KEY, project_id=PROJECT_ID, timeout=API_TIMEOUT, ) with reflect.trace(TASK, limit=MEMORY_LIMIT, lambda_=MEMORY_LAMBDA) as ctx: agent = create_deep_agent( model=MODEL, tools=[internet_search], system_prompt=research_instructions, ) result = agent.invoke({"messages": [{"role": "user", "content": ctx.augmented_task}]}) print(result) trajectory = from_deepagents(result["messages"]) ctx.set_output( trajectory=trajectory, model=MODEL, metadata={"source": "deepagents_exa_quickstart"}, ) if __name__ == "__main__": main() ``` ## Run it ```bash theme={null} python deepagents_exa_quickstart.py # or with a custom query: RESEARCH_QUERY="Who founded Anthropic?" python deepagents_exa_quickstart.py ``` ## How it works `reflect.trace(task)` retrieves relevant memories and opens a trace that will be submitted when the `with` block exits. ```python theme={null} with reflect.trace(TASK, limit=3, lambda_=0.5) as ctx: ... # trace is submitted here, on __exit__ ``` `limit` controls how many memories are retrieved. `lambda_` balances semantic similarity vs. utility when ranking them (0 = pure similarity, 1 = pure utility). `ctx.augmented_task` is the original task with relevant memories from past runs prepended. Passing this to the agent means it can draw on what worked before. ```python theme={null} result = agent.invoke({ "messages": [{"role": "user", "content": ctx.augmented_task}] }) ``` The agent can call `internet_search` to retrieve live data. Exa returns highlights from matching pages, which the agent cites in its report. ```python theme={null} def internet_search(query: str, max_results: int = 5): exa = Exa(api_key=os.environ["EXA_API_KEY"]) return exa.search(query, num_results=max_results, contents={"highlights": {"max_characters": 2000}}) ``` `from_deepagents` maps the agent's message list into Reflect's trajectory format. Call `ctx.set_output` before the `with` block exits to attach the trajectory and metadata to the trace. ```python theme={null} from reflect_sdk.converters import from_deepagents trajectory = from_deepagents(result["messages"]) ctx.set_output( trajectory=trajectory, model=MODEL, metadata={"source": "deepagents_exa_quickstart"}, ) ``` ## Adding a review Pass `result` to `ctx.set_output` to close the learning loop at submission time: ```python theme={null} ctx.set_output( trajectory=trajectory, model=MODEL, result="pass", # or "fail" feedback_text="...", # only needed on fail ) ``` Without `result`, the trace is stored with a pending review and can be reviewed later from the dashboard. ## The `lambda_` parameter When retrieving memories, Reflect ranks candidates by a blended score: ``` score = (1 - lambda_) × similarity + lambda_ × q_value ``` * **`lambda_ = 0`** - pure semantic similarity (most relevant to this query) * **`lambda_ = 0.5`** - balanced (default) * **`lambda_ = 1`** - pure utility - memories from the most successful past runs) As more traces are reviewed, utility scores improve and memory retrieval becomes increasingly useful. # Interactive Feedback CLI Source: https://docs.starlight-search.com/guides/examples/interactive-feedback-cli The building-block pattern: call augment_with_memories and create_trace_and_wait directly, then review interactively. ## Pattern: Direct API This example uses the Reflect SDK at its most explicit level - no decorators, no context managers. You call each step yourself, which makes it the best starting point for understanding how Reflect works. ``` augment_with_memories → LLM call → interactive review → create_trace_and_wait ``` Reviews don't have to happen immediately. You can **defer** a trace and review it later in bulk via the dashboard or the API. ## Prerequisites ```bash theme={null} export REFLECT_API_KEY=rf_live_... export REFLECT_PROJECT_ID=your-project-id export OPENAI_API_KEY=sk-... ``` Install dependencies: ```bash theme={null} pip install reflect-sdk openai ``` ## Full example ```python interactive_feedback_cli.py theme={null} """Interactive CLI example: solve a task with memory augmentation, then review it. This example shows the core Reflect SDK loop: 1. Augment a task with past memories 2. Solve it with an LLM 3. Review the answer (pass / fail / defer) 4. Store the trace so Reflect can learn from it Prerequisites: - Reflect API running locally (or set --base-url) - REFLECT_API_KEY and REFLECT_PROJECT_ID set in the environment - OPENAI_API_KEY set in the environment """ import os import argparse from openai import OpenAI from reflect_sdk import ReflectClient DEFAULT_TASK = "Who is Sonam Pankaj?" def parse_args() -> argparse.Namespace: parser = argparse.ArgumentParser(description="Interactive Reflect SDK demo.") parser.add_argument("--base-url", default="http://localhost:8000", help="Reflect API base URL.") parser.add_argument("--project-id", default=os.getenv("REFLECT_PROJECT_ID"), help="Reflect project id.") parser.add_argument("--reflect-api-key", default=os.getenv("REFLECT_API_KEY"), help="Reflect API key.") parser.add_argument("--model", default="gpt-5.4-mini", help="OpenAI model to use.") parser.add_argument("--task", default=DEFAULT_TASK, help="The task for the model to solve.") parser.add_argument("--limit", type=int, default=3, help="Max number of memories to retrieve.") args = parser.parse_args() if not args.project_id: raise RuntimeError("Set REFLECT_PROJECT_ID or pass --project-id.") if not args.reflect_api_key: raise RuntimeError("Set REFLECT_API_KEY or pass --reflect-api-key.") return args def ask_for_review() -> tuple[str, str | None]: """Prompt the user to pass, fail, or defer the review. Deferring stores the trace without a review - useful when you want to review in bulk later via the dashboard or the API. """ while True: choice = input("Was this answer correct? [y/n/d (defer)]: ").strip().lower() if choice in {"y", "yes"}: return "pass", None if choice in {"n", "no"}: feedback = input("What was wrong? (used as learning feedback): ").strip() return "fail", feedback or "The answer was incorrect." if choice in {"d", "defer"}: return "defer", None print("Please enter 'y', 'n', or 'd'.") def main() -> None: args = parse_args() openai_api_key = os.getenv("OPENAI_API_KEY") if not openai_api_key: raise RuntimeError("OPENAI_API_KEY must be set.") # --- Step 1: Connect to Reflect --- reflect = ReflectClient( base_url=args.base_url, api_key=args.reflect_api_key, project_id=args.project_id, ) # --- Step 2: Augment the task with relevant memories --- # Reflect retrieves past traces and injects their insights into the prompt. augmented = reflect.augment_with_memories(task=args.task, limit=args.limit) print(f"Task: {args.task}") print(f"Retrieved {len(augmented.memories)} relevant memories.\n") # --- Step 3: Solve with an LLM --- messages = [ { "role": "system", "content": ( "Solve the user's task. Use any relevant memories included in the prompt. " "Respond concisely." ), }, {"role": "user", "content": augmented.augmented_task}, ] openai = OpenAI(api_key=openai_api_key) response = openai.chat.completions.create(model=args.model, messages=messages) answer = (response.choices[0].message.content or "").strip() print("Model answer:") print(answer) print() # --- Step 4: Review the answer --- review_result, feedback = ask_for_review() # --- Step 5: Store the trace so Reflect can learn from it --- # When review_result is "pass" or "fail", Reflect immediately generates a # reflection and updates utility scores for memory ranking. # When deferred (review_result=None), the trace is stored without a review # and can be reviewed later via the dashboard or the API. trajectory = messages + [{"role": "assistant", "content": answer}] trace = reflect.create_trace_and_wait( task=args.task, trajectory=trajectory, retrieved_memory_ids=[m.id for m in augmented.memories], model=args.model, review_result=None if review_result == "defer" else review_result, feedback_text=feedback, ) print(f"\nResult: {review_result}") if feedback: print(f"Feedback: {feedback}") print(f"Trace id: {trace.id}") print(f"Review: {trace.review_status}") if trace.created_memory_id: print(f"Memory created: {trace.created_memory_id}") if __name__ == "__main__": main() ``` ## Run it ```bash theme={null} python interactive_feedback_cli.py --task "Explain transformer attention in one sentence" ``` ## How it works `ReflectClient` authenticates with your API key and ties all traces to a project. ```python theme={null} reflect = ReflectClient( base_url="http://localhost:8000", api_key=os.getenv("REFLECT_API_KEY"), project_id=os.getenv("REFLECT_PROJECT_ID"), ) ``` Before calling the LLM, ask Reflect for relevant past experiences. It returns the original task **plus** a memory-augmented version you can pass directly to the model. ```python theme={null} augmented = reflect.augment_with_memories(task=args.task, limit=3) # augmented.augmented_task - task text with memories injected # augmented.memories - list of Memory objects (ids needed later) ``` Pass `augmented.augmented_task` as the user message so the model sees relevant context from past runs. ```python theme={null} messages = [ {"role": "system", "content": "Solve the user's task. Use any relevant memories."}, {"role": "user", "content": augmented.augmented_task}, ] response = openai.chat.completions.create(model="gpt-5.4-mini", messages=messages) ``` You decide whether the answer was correct. Three options: | Input | Meaning | | ----- | ----------------------------------------- | | `y` | Pass - answer was correct | | `n` | Fail - answer was wrong, provide feedback | | `d` | Defer - store now, review later | `create_trace_and_wait` submits the trace and blocks until Reflect has processed it. When `review_result` is `"pass"` or `"fail"`, Reflect immediately generates a reflection and updates utility scores so better memories surface in future runs. When `review_result` is `None` (deferred), the trace is stored as-is and can be reviewed later from the dashboard. ```python theme={null} trace = reflect.create_trace_and_wait( task=args.task, trajectory=trajectory, retrieved_memory_ids=[m.id for m in augmented.memories], model="gpt-5.4-mini", review_result="pass", # or "fail", or None to defer feedback_text=feedback, # only meaningful on fail ) ``` ## Key concept: deferred reviews Passing `review_result=None` stores the trace without triggering the learning loop. This is useful when: * You're running a batch and want to review results in one go * A human reviewer needs to approve the answer asynchronously * You want to collect traces first and label them later Deferred traces appear in the dashboard with a **pending review** status. You can review them there or via the API. # OpenAI Agents Source: https://docs.starlight-search.com/guides/examples/openai-agents The decorator pattern: wrap an async agent function with @reflect_trace for automatic memory retrieval and trace submission. ## Pattern: `@reflect_trace` Decorator The decorator pattern is the lowest-friction way to add Reflect to an existing agent function. Annotate your function with `@reflect_trace` and Reflect handles memory retrieval, trace submission, and `retrieved_memory_ids` tracking - you just write the agent logic. ``` @reflect_trace → ctx.augmented_task → agent runs → TraceResult returned → trace auto-submitted ``` This example uses the [OpenAI Agents SDK](https://openai.github.io/openai-agents-python/) with a `WebSearchTool`. ## Prerequisites ```bash theme={null} export REFLECT_API_KEY=rf_live_... export REFLECT_PROJECT_ID=your-project-id export OPENAI_API_KEY=sk-... ``` Install dependencies: ```bash theme={null} pip install reflect-sdk openai-agents ``` ## Full example ```python openai_agents_reflect_simple.py theme={null} """Simple OpenAI Agents SDK + Reflect SDK example (no CLI). Prerequisites: - Reflect API running (default: http://localhost:8000) - REFLECT_PROJECT_ID and REFLECT_API_KEY set - OPENAI_API_KEY set """ from __future__ import annotations import asyncio import os from agents import Agent, Runner, WebSearchTool from reflect_sdk import ReflectClient, TraceContext, TraceResult, reflect_trace from reflect_sdk.converters import from_openai_agents REFLECT_API_URL = os.getenv("REFLECT_API_URL", "http://localhost:8000") REFLECT_API_KEY = os.getenv("REFLECT_API_KEY") MODEL = os.getenv("OPENAI_MODEL", "gpt-5.4-mini") TASK = "What are the stock prices for Apple and Nvidia?" async def main() -> None: if not REFLECT_API_KEY: raise RuntimeError("Set REFLECT_API_KEY.") if not os.getenv("OPENAI_API_KEY"): raise RuntimeError("Set OPENAI_API_KEY.") reflect = ReflectClient( base_url=REFLECT_API_URL, api_key=REFLECT_API_KEY, project_id="example", timeout=120.0, ) @reflect_trace(reflect, task=lambda question: question) async def answer(ctx: TraceContext, question: str) -> TraceResult: agent = Agent( name="Research assistant", model=MODEL, tools=[WebSearchTool()], instructions=( "You are a concise research assistant. Use relevant memories from the prompt if present. " "Return a short, factual answer." ), ) result = await Runner.run(agent, input=ctx.augmented_task) final_response = str(result.final_output).strip() trajectory = from_openai_agents(result) return TraceResult( output=final_response, # returned to the caller trajectory=trajectory, model=MODEL, metadata={"source": "openai_agents_reflect_simple"}, ) answer_result = await answer(question=TASK) print(answer_result) if __name__ == "__main__": asyncio.run(main()) ``` ## Run it ```bash theme={null} python openai_agents_reflect_simple.py ``` ## How it works One `ReflectClient` instance is shared across all decorated functions in your app. ```python theme={null} reflect = ReflectClient( base_url=os.getenv("REFLECT_API_URL", "http://localhost:8000"), api_key=os.getenv("REFLECT_API_KEY"), project_id="example", ) ``` The decorator intercepts the call, retrieves relevant memories, and injects a `TraceContext` as the first argument. The `task` parameter tells Reflect how to extract the task string from your function's arguments. ```python theme={null} @reflect_trace(reflect, task=lambda question: question) async def answer(ctx: TraceContext, question: str) -> TraceResult: ... ``` `ctx.augmented_task` contains the original question with past memories appended. Pass this to the agent instead of the raw question. The agent receives the memory-augmented prompt, so it can draw on what worked (or didn't) in previous runs. ```python theme={null} result = await Runner.run(agent, input=ctx.augmented_task) ``` `from_openai_agents` converts the agent's message history into the format Reflect expects. ```python theme={null} from reflect_sdk.converters import from_openai_agents trajectory = from_openai_agents(result) ``` Reflect ships converters for popular agent frameworks so you don't have to map message formats manually. Return a `TraceResult` from the decorated function. The decorator uses it to submit the trace and passes `output` back to the original caller. ```python theme={null} return TraceResult( output=final_response, # what answer() returns to the caller trajectory=trajectory, model=MODEL, metadata={"source": "openai_agents_reflect_simple"}, ) ``` To add a review at submission time, include `result="pass"` or `result="fail"` and optionally `feedback_text`. ## Adding a review To close the learning loop immediately, include `result` in `TraceResult`: ```python theme={null} return TraceResult( output=final_response, trajectory=trajectory, model=MODEL, result="pass", # or "fail" feedback_text="The answer was incomplete.", # only needed on fail ) ``` Without `result`, the trace is stored with a pending review status and can be reviewed later from the dashboard. # MCP Server Source: https://docs.starlight-search.com/guides/mcp Connect Reflect memory tools to Cursor, Claude Code, Windsurf, and any MCP-compatible client. Just want to wire up an MCP client quickly? See [For AI Agents](/for-agents#mcp-server) for a 2-minute quickstart with all the major IDE configs in one place. This page is the deep-dive reference. The Reflect MCP server exposes two tools — `retrieve_memories` and `create_memory` — over the [Model Context Protocol](https://modelcontextprotocol.io). Connect it to any MCP-capable agent or IDE and your AI assistant will automatically query past lessons before hard tasks and record new ones after each run. **MCP endpoint:** `https://api.starlight-search.com/mcp`\ **Auth:** `Authorization: Bearer `\ **Transport:** Streamable HTTP Get your API key from the [Reflect console](https://reflect.starlight-search.com). *** ## Claude Code Add the server to your project or global config: ```bash theme={null} claude mcp add --transport http reflect https://api.starlight-search.com/mcp \ --header "Authorization: Bearer rf_live_..." ``` Or add it manually to `.claude/settings.json` (project) or `~/.claude/settings.json` (global): ```json theme={null} { "mcpServers": { "reflect": { "type": "http", "url": "https://api.starlight-search.com/mcp", "headers": { "Authorization": "Bearer rf_live_..." } } } } ``` Verify it loaded: ```bash theme={null} claude mcp list ``` *** ## Cursor Open **Settings → MCP** (or `~/.cursor/mcp.json`) and add: ```json theme={null} { "mcpServers": { "reflect": { "type": "http", "url": "https://api.starlight-search.com/mcp", "headers": { "Authorization": "Bearer rf_live_..." } } } } ``` Restart Cursor. The tools appear in Agent mode automatically. *** ## Windsurf Open **Settings → Cascade → MCP Servers** and add a new server: ```json theme={null} { "reflect": { "serverUrl": "https://api.starlight-search.com/mcp", "headers": { "Authorization": "Bearer rf_live_..." } } } ``` *** ## Cline / Continue / other MCP clients Any client that supports streamable HTTP transport uses the same config pattern: ```json theme={null} { "mcpServers": { "reflect": { "type": "http", "url": "https://api.starlight-search.com/mcp", "headers": { "Authorization": "Bearer rf_live_..." } } } } ``` *** ## Per-project scoping with X-Project-Id By default the server uses the project ID configured on your API key. To scope memories to a specific project per-request, pass the `X-Project-Id` header: ```json theme={null} { "headers": { "Authorization": "Bearer rf_live_...", "X-Project-Id": "my-project" } } ``` *** ## Available tools ### `retrieve_memories` Search the memory store for lessons from prior tasks. Call this **before** starting non-trivial work. | Parameter | Type | Default | Description | | --------- | ------ | -------- | --------------------------------------------------------------------------- | | `query` | string | required | Natural-language description of the task you are about to do | | `limit` | int | 5 | Max memories to return (up to 20) | | `lambda_` | float | 0.5 | Blend between semantic similarity (1.0) and Q-value / learned utility (0.0) | Returns a list of memories with `id`, `task`, `reflection`, `q_value`, `similarity`, `score`, and `success`. Save the `memory_ids` — you'll pass them to `create_memory`. ### `create_memory` Persist a completed run as a memory so the agent can learn from it. Call this **after** the user confirms success or gives corrective feedback. | Parameter | Type | Default | Description | | ---------------------- | -------------------- | -------- | -------------------------------------------------------- | | `task` | string | required | The task the agent was executing | | `final_response` | string | required | The agent's final answer or deliverable | | `trajectory` | string or list | required | Tool calls, decisions, and errors that led to the answer | | `result` | `"pass"` or `"fail"` | required | Outcome — drives the reward signal | | `feedback_text` | string | optional | Verbatim or summarized user feedback | | `retrieved_memory_ids` | list\[string] | optional | IDs from `retrieve_memories` — enables Q-value updates | *** ## Verify the connection The health endpoint requires no auth: ```bash theme={null} curl https://api.starlight-search.com/mcp/health # {"status": "ok"} ``` # Memories Source: https://docs.starlight-search.com/guides/memories What memories are, how they're created, how to retrieve them, and best practices for getting the most out of memory-augmented agents. ## Overview A **memory** is a concise, LLM-generated reflection distilled from a past agent run. It captures what the agent did, what worked, what went wrong, and what to do differently next time - then stores that knowledge so future runs can benefit from it. Memories exist because LLMs are stateless. Each call starts from scratch with no awareness of what happened last time. Reflect solves this by maintaining a project-level memory bank that accumulates experience across runs, users, and sessions. Before each task, your agent queries this bank and receives the most relevant past reflections ranked by both **semantic similarity** (is this about the same kind of problem?) and **utility** (did this advice actually lead to good outcomes?). This dual ranking is the core design choice behind Reflect's memory system. Pure semantic search returns relevant results, but it can't distinguish between a reflection that led to a correct answer and one that didn't. utility scores add a learned quality signal that improves over time as more traces are reviewed. ### How memories are created Memories are never written directly. They are always generated from a reviewed trace: 1. Your agent completes a task and you submit the trace with a review (`"pass"` or `"fail"`) 2. An LLM reads the trace (task, trajectory, outcome, feedback) and generates a reflection 3. The task is embedded and stored in a memory bank along with the reflection with an initial `q_value` of `0.5` 4. On future runs, the memory is retrieved when the query is semantically similar and the utility score is high enough. This means you can't manually insert arbitrary text into memory - every memory has a traceable origin, and its quality is tracked over time. ### How utility scores evolve Utility scores are updated every time a memory is **retrieved and the run that used it is reviewed**. A memory starts at `q_value = 0.5`. If it's retrieved in a run that passes, its utility nudges upward. If retrieved in a run that fails, it nudges downward. Over many reviews, useful memories converge toward 1.0 and unhelpful ones toward 0.0. utility scores only update for memories that were **retrieved and used** in a run. If a memory exists but wasn't retrieved for a particular trace, its utility is unaffected by that trace's review. ## Retrieving memories ### `query_memories` - raw retrieval Returns a ranked list of `Memory` objects without modifying the task text. Use this when you want full control over how memories are injected into your prompt. ```python theme={null} memories = client.query_memories( task="How do I handle rate limits in an API client?", limit=10, lambda_=0.5, ) for m in memories: print(f"[q={m.q_value:.2f}] {m.task}") print(f" {m.reflection[:100]}...") ``` ### `augment_with_memories` - retrieval + formatting Queries memories and appends them to the task as a structured text block. This is the most common method - it returns a ready-to-use prompt that you pass directly to the LLM. ```python theme={null} augmented = client.augment_with_memories( task="Implement exponential backoff for retries", limit=5, lambda_=0.6, ) # Pass this to your LLM - it includes the task + relevant memories prompt_for_llm = augmented.augmented_task # The Memory objects are also available if you need their IDs later retrieved = augmented.memories ``` The formatted output groups memories into three sections based on their review outcomes: ```text theme={null} Implement exponential backoff for retries Relevant memories: Successful memories: --- Memory 1 --- Past task: Handle transient API failures Reflection: Use a base delay with exponential increase and random jitter... Failed memories: --- Memory 2 --- Past task: Retry failed HTTP requests Reflection: Fixed delays without jitter caused thundering herd issues... ``` If no memories are found, `augmented_task` returns the original task unchanged - so you can always use it safely without checking. ### Parameters | Parameter | Type | Default | Description | | ---------------------- | --------------- | -------- | -------------------------------------------------------------------------------------------------------------------------------------------------------- | | `task` | `str` | required | The task text to search against. The API embeds this and finds semantically similar memories. | | `limit` | `int` | `10` | Maximum number of memories to return. | | `lambda_` | `float` | `0.5` | Blend weight between similarity and utility (see below). | | `metadata_filter` | `dict \| None` | `None` | Optional metadata key/value pairs that memories must match. See [Filtering by metadata](#filtering-by-metadata). | | `similarity_threshold` | `float \| None` | `None` | Minimum cosine similarity a candidate must reach. Overrides the server default. See [Tuning the similarity threshold](#tuning-the-similarity-threshold). | ### The `Memory` object | Field | Type | Description | | ------------------ | -------------- | ----------------------------------------------------------------------------- | | `id` | `str` | Unique identifier - pass these as `retrieved_memory_ids` when creating traces | | `task` | `str` | The past task this memory was generated from | | `reflection` | `str` | LLM-generated reflection text | | `q_value` | `float` | Learned quality score (0-1, higher = better track record) | | `similarity` | `float` | Cosine similarity to the query task | | `score` | `float` | Final ranking score: `(1 - lambda_) * similarity + lambda_ * q_value` | | `success` | `bool \| None` | Whether the source trace passed review (`None` if unreviewed) | | `summary` | `str` | One-sentence description of the past task | | `key_mistake` | `str` | Specific wrong action or omission (empty for successful memories) | | `correct_action` | `str` | Specific right action — tool name + argument patterns | | `applicable_tools` | `list[str]` | Tools this lesson is about (LLM-chosen) | | `guidance` | `str` | One-paragraph general strategy | | `tools_used` | `list[str]` | Tools that actually appeared in the trajectory (server-extracted) | The fields `summary`, `key_mistake`, `correct_action`, `applicable_tools`, `guidance`, and `tools_used` are populated for memories reviewed after the structured-reflection upgrade. Older memories will have these as empty strings / empty lists but retain their original `reflection` text. ## Tuning retrieval with `lambda_` The `lambda_` parameter controls the balance between semantic relevance and learned quality when ranking memories: ``` score = (1 - lambda_) * similarity + lambda_ * q_value ``` | Value | Behavior | When to use | | --------- | ------------------------ | -------------------------------------------------------------------------------------------------- | | `0.0` | Pure semantic similarity | Early in a project when you have few reviewed traces and utility scores haven't differentiated yet | | `0.5` | Equal weight (default) | General-purpose starting point - works well for most projects | | `0.7–0.9` | Favor utility | Mature projects with many reviewed traces - surface memories with the best track records | | `1.0` | Pure utility | Only retrieve the historically most successful memories, regardless of semantic match | ### When to adjust * **Increase `lambda_`** if your agent keeps retrieving relevant-sounding memories that lead to bad outcomes. The memories are topically similar but not actually helpful - utility scores will down-rank them. * **Decrease `lambda_`** if your agent needs broader context from different past tasks. Strict utility ranking can narrow retrieval too much, especially when the most successful memories are about a different subtopic. * **Keep `0.5`** if you're unsure. The default works well until you have enough reviewed traces to notice a pattern. ## Filtering by metadata Any key/value pairs you pass on `metadata` when creating a trace are stored on the resulting memory and become filterable at retrieval time. ```python theme={null} # Tag at write time — metadata flows onto the memory created from this trace. client.create_trace( task="Cancel reservation via travel insurance", trajectory=messages, review_result="pass", metadata={"action_type": "cancel", "domain": "airline"}, ) # Filter at read time — only memories whose payload matches ALL keys are returned. memories = client.query_memories( task="Customer wants to cancel a flight", metadata_filter={"action_type": "cancel", "domain": "airline"}, ) ``` The filter is ANDed with the internal `project_id` / `user_id` / `status` filter, so callers cannot reach across projects. Metadata keys can take any JSON-serializable value. Filter values are scalar; when the stored field is a **list**, Qdrant matches if the scalar value is a member of the list — useful for tagging a single memory with multiple categories: ```python theme={null} client.create_trace( ..., metadata={"action_types": ["cancel", "modify"]}, # list at write time ) # Scalar filter value finds memories whose list contains it. client.query_memories(..., metadata_filter={"action_types": "modify"}) ``` Memories created **before** you start passing metadata won't have any fields to match against. A `metadata_filter` that works on new memories will exclude older ones. ## Tuning the similarity threshold Every retrieved candidate must clear a minimum cosine-similarity floor before it can be re-ranked and returned. The server's default is set in `config.toml` (`[memory].similarity_threshold`, typically `0.5`). Clients can override per-call: ```python theme={null} memories = client.query_memories( task="How do I implement retries with jitter?", similarity_threshold=0.3, # looser than default ) ``` | Value | Behavior | | --------------- | --------------------------------------------------------------------------------------------------------------------------- | | `0.0` | Disable the floor — every candidate is considered. Useful when bootstrapping a small memory bank where all cosines are low. | | `0.3–0.4` | Permissive. Lets weak but possibly-relevant memories through. Good for heterogeneous domains where embeddings collide. | | `0.5` (default) | Moderate. Filters obviously-unrelated memories. | | `0.7+` | Strict. Only near-duplicate retrievals pass. Use when the memory bank is large and dense. | Pair this with `metadata_filter` when the bank spans multiple task types: the metadata filter does coarse partitioning (same category only), and the similarity threshold does fine filtering within that partition. ## Tuning Q-value learning rate with `alpha` Each time a memory is retrieved and its source trace is reviewed, Reflect updates the memory's `q_value` via a Bellman-style step: ``` q_new = q_old + alpha * (reward - q_old) ``` `reward` is `1.0` for `"pass"`, `0.0` for `"fail"`. `alpha ∈ [0, 1]` controls how aggressively the Q-value tracks each new review. The server default is `0.3` (matches the [MemRL paper's](https://arxiv.org/abs/2601.03192) configs), but you can override per-review through the SDK: ```python theme={null} client.create_trace( task=..., trajectory=..., review_result="pass", alpha=0.5, # this review weighs heavier than usual ) ``` | `alpha` | Behavior | When to use | | --------------- | ------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------- | | `0.05–0.1` | Slow, smooth Q-value updates | High-volume projects with consistent task distributions; you want stable rankings and many reviews per memory | | `0.3` (default) | Balanced — Q-values differentiate within \~10–20 reviews per memory | General starting point; matches MemRL's published value across all four of their benchmarks | | `0.5–0.7` | Aggressive — Q-values shift sharply per review | Small memory banks where you have few reviews per memory and want them to count for more | | `1.0` | Pure overwrite — `q_new = reward` | Rare; effectively disables historical averaging | ### When to adjust per-review Most callers should leave `alpha` unset and let the server default apply. Per-review override is useful when: * **Authoritative reviews vs noisy ones.** Pass `alpha=0.5` for reviews from a trusted human expert and `alpha=0.1` for reviews from a less-reliable source like an LLM judge. * **Bootstrapping a new project.** First few hundred reviews can use `alpha=0.5+` to differentiate memory quality fast, then drop to `0.3` once the bank has matured. * **Penalizing pivotal failures.** A review with strong evidence the memory caused harm can use `alpha=0.5+` to drop its Q-value sharply. The server-side default lives in `config.toml` at `[q_learning].alpha` and can also be overridden per-deployment via the `Q_LEARNING_ALPHA` environment variable. ## Best practices This is what connects the learning loop. When you create a trace, pass the IDs of the memories that were retrieved for that run. Without them, Reflect can't update utility scores when the trace is reviewed - the memory ranking won't improve. The context manager (`client.trace()`) and decorator (`@reflect_trace`) handle this automatically. If you use `create_trace` directly, you must pass them yourself. ```python theme={null} trace = client.create_trace( task="...", trajectory=[...], retrieved_memory_ids=[m.id for m in augmented.memories], # don't forget this review_result="pass", ) ``` The task string is what Reflect embeds and matches against when retrieving memories. Vague tasks like `"do the thing"` will match poorly. Descriptive tasks like `"Parse the uploaded CSV, validate column types, and return the first 5 rows"` will retrieve more relevant memories. The task is also included in the reflection prompt - a clear task helps the LLM generate better reflections. More memories means more context in the prompt, which costs tokens and can dilute the signal. Start with `limit=3` to `limit=5` and increase if the agent seems to be missing relevant context. Memories start with `q_value=0.5` and only differentiate through reviews. An unreviewed project has flat utility scores - every memory is ranked equally. Reviews are what make the system learn. Even a few dozen reviews can significantly improve retrieval quality. `augment_with_memories` already groups memories into "Successful", "Failed", and "Other" sections. The LLM sees the distinction naturally. You don't need to filter out failed memories - they contain valuable "what not to do" context. # Roadmap Source: https://docs.starlight-search.com/guides/roadmap Upcoming priorities for the StarlightSearch Reflect SDK. This roadmap tracks near-term priorities for SDK and ecosystem integration. ## Planned * **MCP support for easy integrations**\ First-class MCP support so Reflect can plug into any agent stack, harness, or runtime system with minimal setup. * **Broader agent framework adapters**\ Expand example and helper coverage for common agent orchestration frameworks. * **TypeScript SDK** OpenClaw and other typescript agent systems. ## Notes * Priorities can change based on user feedback and production usage. * If you need a specific integration, open an issue with your runtime/harness details. # Skills Source: https://docs.starlight-search.com/guides/skills Project-level skills auto-generated from your reviewed Reflect traces. (Looking to integrate Reflect? See For AI Agents.) **This page is about skills *generated by* Reflect — not skills for integrating Reflect.** Reflect can distill your reviewed traces into a single project-level guide, retrievable via `client.get_skill()`. That's what this page covers. If you're looking for the installable workflow skill that walks a coding agent through *adding Reflect to your project* (e.g. `npx skills add StarlightSearch/reflect-skills@integrate-reflect`), go to [**For AI Agents**](/for-agents#skills) instead. ## Overview A Reflect **skill** is a project-level guide distilled *from your reviewed traces*. While [memories](/guides/memories) are per-task reflections retrieved by semantic similarity, a skill is a single consolidated document that captures the proven strategies and pitfalls for an entire project — auto-generated by Reflect once you have enough reviewed runs. Skills are built using **hierarchical consolidation** inspired by the [Trace2Skill](https://arxiv.org/abs/2603.25158) paper: 1. **Level 1** (automatic): each reviewed trace generates a per-task reflection (a memory) 2. **Level 2**: the top passed and failed reflections are consolidated separately into "proven strategies" and "pitfalls to avoid" 3. **Level 3**: both summaries are synthesized into a unified skill guide The output follows the [Anthropic skill standard](https://docs.anthropic.com/en/docs/claude-code/skills) with YAML frontmatter and structured markdown. ### Skills vs Memories | | Memories | Skills | | ------------- | --------------------------------------------------------------- | ------------------------------------------------------------------------- | | **Scope** | Per-task: retrieved by similarity to the current task | Per-project: same guide for every task | | **Count** | Many per project (one per reviewed trace) | One per project | | **Retrieval** | Automatic via `query_memories` or `augment_with_memories` | Explicit via `get_skill()` | | **Creation** | Automatic on trace review | On-demand (dashboard button or API call) | | **Best for** | Task-specific context ("last time I tried this exact thing...") | Project-wide patterns ("on this project, always check for pagination...") | Use memories when your tasks are diverse and need targeted context. Use skills when you want a consistent baseline of project knowledge injected into every run. You can also use both together. ## Creating a skill ### From the dashboard 1. Navigate to the **Memories** tab for your project 2. Once you have at least 5 reviewed memories, a **Create Skill** button appears 3. Click **Settings** to configure how many passed/failed reflections to sample (default: 5 each) 4. Click **Create Skill** The skill appears with its frontmatter metadata rendered as fields. You can: * **Refine Skill**: regenerate with the latest memories (version increments) * **Download .md**: save as a markdown file to use in your agent's prompt or as a Claude Code skill ### From the API ```bash theme={null} # Create or refine the skill curl -X POST "https://api.starlight-search.com/v1/projects/my-project/skill/create" \ -H "Authorization: Bearer rf_live_..." \ -H "Content-Type: application/json" \ -d '{"n_passed": 5, "n_failed": 5}' # Retrieve the current skill curl "https://api.starlight-search.com/v1/projects/my-project/skill" \ -H "Authorization: Bearer rf_live_..." ``` ### From the SDK ```python theme={null} from reflect_sdk import ReflectClient client = ReflectClient( api_key="rf_live_...", project_id="my-project", ) # Retrieve the skill (returns None if not created yet) skill = client.get_skill() if skill: print(skill) ``` ## Using a skill ### Inject into your agent's prompt The simplest approach: prepend the skill to your system prompt or task: ```python theme={null} skill = client.get_skill() system_prompt = "You are a helpful assistant." if skill: system_prompt += f"\n\n\n{skill}\n" response = my_llm(system_prompt=system_prompt, task=user_task) ``` ### Use with the trace context manager When using skills, you typically skip per-task memory retrieval since the skill already provides project-wide context: ```python theme={null} skill = client.get_skill() with client.trace(task, limit=1) as ctx: if skill: prompt = task + "\n\n\n" + skill + "\n" else: prompt = ctx.augmented_task # fallback to memories response = my_agent(prompt) ctx.set_output( trajectory=[...], result="pass", ) ``` ### Download and use as a Claude Code skill Click **Download .md** in the dashboard to save the skill file. Place it in your Claude Code skills directory: ```bash theme={null} # Project-scoped skill mkdir -p .claude/skills/my-project-skill cp my-project-skill.md .claude/skills/my-project-skill/SKILL.md # Or personal skill (available across projects) mkdir -p ~/.claude/skills/my-project-skill cp my-project-skill.md ~/.claude/skills/my-project-skill/SKILL.md ``` Claude Code will automatically load the skill when it matches the task context. ## Skill format Skills follow the Anthropic skill standard with YAML frontmatter: ```markdown theme={null} --- name: my-project description: Project-specific skill for my-project distilled from 10 reviewed traces. project: my-project source_memories: 10 passed_sampled: 5 failed_sampled: 5 generated_at: 2025-01-15T10:30:00+00:00 --- ## Proven Strategies 1. Always check for pagination tokens when querying the search API... 2. Use structured entity chains (Label -> Artist -> Album) and validate each link... ## Pitfalls to Avoid 1. Never piece together partial data from multiple sources when a single ranked list exists... 2. Verify ordinal positions against primary sources, not summaries... ## General Procedure 1. Identify the core question type and map it to the relevant strategy 2. Start with primary databases and structured lists 3. Cross-check each component against at least two sources 4. Verify the final answer against the original query's requirements ``` ### Frontmatter fields | Field | Description | | ----------------- | ----------------------------------------------------------- | | `name` | Project identifier | | `description` | What the skill covers and how many traces it was built from | | `project` | Project ID | | `source_memories` | Total number of reflections sampled | | `passed_sampled` | Number of successful reflections used | | `failed_sampled` | Number of failed reflections used | | `generated_at` | ISO 8601 timestamp of when the skill was generated | ## Configuration The skill generation can be configured in `config.toml`: ```toml theme={null} [skill] min_memories_for_skill = 5 # minimum reviewed memories before skill can be created default_n_passed = 5 # default number of top passed reflections to sample default_n_failed = 5 # default number of top failed reflections to sample ``` These defaults are used when the API request doesn't specify `n_passed` or `n_failed`. The dashboard Settings panel lets you override these per-request. ## Best practices A skill built from 3 very similar traces will be narrow. Wait until you have at least 5-10 reviewed traces covering different task types within the project. The more diverse the traces, the more generalizable the skill. Skills are a snapshot. After reviewing more traces (especially failures that reveal new pitfalls), click **Refine Skill** to regenerate with the latest data. The version number increments so you can track changes. The default of 5 passed + 5 failed works for most projects. For mature projects with 50+ traces, increase to 10-15 per category via the Settings panel to capture more patterns. Skills provide a broad baseline ("on this project, always do X"). Memories provide task-specific context ("last time I tried this exact query..."). For complex projects, inject the skill into the system prompt and use memory augmentation for the task: ```python theme={null} skill = client.get_skill() with client.trace(task) as ctx: system = f"You are a research assistant.\n\n\n{skill}\n" response = my_agent(system_prompt=system, task=ctx.augmented_task) ctx.set_output(trajectory=[...], result="pass") ``` The downloaded `.md` file works as a Claude Code skill, a system prompt snippet, or documentation for your team. The frontmatter is valid YAML and can be parsed by any tool that understands the Anthropic skill format. # Traces and reviews Source: https://docs.starlight-search.com/guides/traces-and-reviews What traces and reviews are, how to record them using the three SDK patterns, and how reviews drive the learning loop. ## Overview A **trace** is the complete record of a single agent run - the task it was given, the full message trajectory (every user message, assistant response, and tool call), which memories were retrieved, and which model was used. Think of it as a structured log entry that Reflect can learn from. A **review** is a pass/fail judgment on a trace. When you review a trace, Reflect: 1. Reads the trajectory, the outcome, and your feedback 2. Generates a concise **reflection** (an LLM-produced summary of what worked or went wrong) 3. Embeds the reflection and stores it as a new **memory** with an initial utility of 0.5 4. Updates the utility scores of the memories that were retrieved during that run (up for pass, down for fail) Without reviews, Reflect is just a trace logger. Reviews are what close the learning loop - they're the training signal that makes memory retrieval improve over time. ### Why traces capture the full trajectory Reflect stores the entire conversation, not just the final answer, because the reflection LLM needs context to generate useful advice. A reflection like "always verify the order exists before processing a return" can only be generated if the trajectory shows that the agent *didn't* verify the order. The final answer alone wouldn't reveal that. The trajectory also enables the dashboard to show step-by-step replays, which is useful for debugging and manual review. ### Why reviews are separate from traces Reviews can be submitted inline (at trace creation time) or deferred (later, via the API or dashboard). This separation exists because: * **Automated pipelines** know the answer immediately (e.g., comparing against a gold answer) and can submit inline reviews * **Human review workflows** need to collect the trace first and review asynchronously * **Batch evaluation** collects many traces and reviews them all at once Both paths produce the same result: a reflection is generated, a memory is created, and utility scores are updated. ## Three ways to record traces The SDK provides three patterns for recording traces. They all produce the same result - a trace stored in Reflect - but differ in how much boilerplate they handle for you. ### Pattern 1: Context manager The context manager retrieves memories on entry and auto-submits the trace on exit. It tracks `retrieved_memory_ids` for you, so the utility learning loop works automatically. **Best for:** multi-step workflows, streaming, cases where you need to inspect output before deciding the review result. ```python theme={null} with client.trace("Parse the CSV and return the top 5 rows") as ctx: # ctx.augmented_task - the task with relevant memories appended # ctx.memories - the retrieved Memory objects response = my_agent(ctx.augmented_task) ctx.set_output( trajectory=[ {"role": "user", "content": ctx.augmented_task}, {"role": "assistant", "content": response}, ], result="pass", model="gpt-5.4-mini", ) # Trace auto-submitted on exit with correct retrieved_memory_ids # ctx.trace_id is now available for deferred review or logging ``` #### Context manager parameters | Parameter | Type | Default | Description | | ------------------------ | ------- | -------- | -------------------------------------------------------- | | `task` | `str` | required | Task description for memory retrieval and trace logging | | `limit` | `int` | `10` | Maximum memories to retrieve | | `lambda_` | `float` | `0.5` | Blend between similarity and utility | | `blocking` | `bool` | `False` | Wait for memory creation before exiting the context | | `auto_fail_on_exception` | `bool` | `True` | Auto-submit with `result="fail"` on unhandled exceptions | #### `ctx.trace_id` After the `with` block exits, `ctx.trace_id` contains the ID of the submitted trace. This is useful for deferred reviews — pass it to `client.review_trace()` later in your application. Inside the `with` block (before submission), `trace_id` is `None`. #### `set_output` parameters | Parameter | Type | Default | Description | | ---------------- | ------------------- | -------- | ----------------------------------------------------------- | | `trajectory` | `list[dict] \| str` | required | The conversation messages | | `final_response` | `str \| None` | `None` | Agent's final answer (extracted from trajectory if omitted) | | `result` | `str \| None` | `None` | `"pass"` or `"fail"` - omit to defer the review | | `feedback_text` | `str \| None` | `None` | What went wrong (used when `result="fail"`) | | `model` | `str \| None` | `None` | Model name for dashboard display | | `metadata` | `dict \| None` | `None` | Arbitrary JSON metadata | #### Blocking mode Pass `blocking=True` to wait for the reflection and memory to be created before the `with` block exits. Useful in evaluation loops where the next task needs to retrieve the memory from the previous one. ```python theme={null} with client.trace("...", blocking=True) as ctx: response = my_agent(ctx.augmented_task) ctx.set_output(trajectory=messages, result="pass") # Memory is guaranteed to exist here - the next task can retrieve it ``` #### Exception handling If an unhandled exception occurs after `set_output` was called, the trace is auto-submitted with `result="fail"` and the exception message as feedback. This prevents losing trace data on crashes. Disable with `auto_fail_on_exception=False`. #### Async variant ```python theme={null} async with client.trace_async("Debug login timeout") as ctx: response = await my_llm(ctx.augmented_task) ctx.set_output(trajectory=messages, result="pass") ``` ### Pattern 2: `@reflect_trace` decorator The decorator wraps a function so that memory retrieval, trace submission, and `retrieved_memory_ids` tracking happen automatically. You just write the agent logic. **Best for:** single-function agents, clean integration with existing function signatures, when you want the least boilerplate. ```python theme={null} from reflect_sdk import TraceContext, TraceResult, reflect_trace @reflect_trace(client, task=lambda question: question) def answer(ctx: TraceContext, question: str) -> TraceResult: messages = [{"role": "user", "content": ctx.augmented_task}] response = my_llm(messages) messages.append({"role": "assistant", "content": response}) return TraceResult( output=response, # returned to the caller trajectory=messages, result="pass", model="gpt-5.4-mini", ) # Calling the function queries memories, runs your code, and submits the trace result = answer("Parse the CSV and return the top 5 rows") # result == response (the output from TraceResult) ``` #### Decorator parameters | Parameter | Type | Default | Description | | ------------------------ | ------------------------- | -------- | ------------------------------------------------------------------------------------------------------------------------- | | `client` | `ReflectClient` | required | The client instance | | `task` | `str \| Callable \| None` | `None` | How to derive the task string - static string, callable on the function's args, or `None` to use the first positional arg | | `limit` | `int` | `10` | Maximum memories to retrieve | | `lambda_` | `float` | `0.5` | Blend between similarity and utility | | `blocking` | `bool` | `False` | Wait for memory creation before returning | | `auto_fail_on_exception` | `bool` | `True` | Auto-submit `"fail"` on unhandled exceptions | | `inject_context` | `bool` | `True` | Prepend a `TraceContext` as the first argument | #### Two return types ```python TraceResult (full control) theme={null} @reflect_trace(client, task=lambda question: question) def answer(ctx: TraceContext, question: str) -> TraceResult: messages = [{"role": "user", "content": ctx.augmented_task}] response = my_llm(messages) messages.append({"role": "assistant", "content": response}) return TraceResult( output=response, trajectory=messages, result="pass", model="gpt-5.4-mini", ) ``` ```python String (minimal) theme={null} @reflect_trace(client, task=lambda question: question) def answer(ctx: TraceContext, question: str) -> str: response = my_llm(ctx.augmented_task) return response # used as both trajectory and final_response ``` Return a `TraceResult` when you need to provide the full trajectory, review result, feedback, model, or metadata. Return a plain string for quick prototyping - the string is used as both the trajectory and the final response. #### Async support Async functions are detected automatically: ```python theme={null} @reflect_trace(client, task=lambda question: question) async def answer(ctx: TraceContext, question: str) -> TraceResult: ... ``` ### Pattern 3: Explicit API calls Call `augment_with_memories` and `create_trace` directly. This gives you full control over every step but requires you to pass `retrieved_memory_ids` manually. **Best for:** existing codebases where you can't wrap the agent function, batch pipelines, cases where traces are created far from where memories are retrieved. ```python theme={null} augmented = client.augment_with_memories( task="Parse the CSV and return the top 5 rows", ) response = my_agent(augmented.augmented_task) submission = client.create_trace( task="Parse the CSV and return the top 5 rows", trajectory=[ {"role": "user", "content": augmented.augmented_task}, {"role": "assistant", "content": response}, ], retrieved_memory_ids=[m.id for m in augmented.memories], model="gpt-5.4-mini", review_result="pass", ) ``` When using `create_trace` directly, you **must** pass `retrieved_memory_ids` manually. If you forget, the utility learning loop breaks silently - memories won't be reinforced or penalized based on outcomes. The context manager and decorator handle this automatically. #### `create_trace` vs `create_trace_and_wait` | Method | Behavior | Use when | | ----------------------- | ----------------------------------------------------------------------------------- | ------------------------------------------------------------------------------ | | `create_trace` | Returns immediately with a `TraceSubmission`. Processing happens in the background. | Your app serves real-time traffic and you don't want to block | | `create_trace_and_wait` | Blocks until the trace is fully processed and the memory is created. | Evaluation loops, tests, scripts where the next step needs the memory to exist | ```python theme={null} # Non-blocking - returns immediately submission = client.create_trace(task="...", trajectory=[...], review_result="pass") # submission.ingest_status == "queued" # Blocking - waits for the memory to be created trace = client.create_trace_and_wait( task="...", trajectory=[...], review_result="pass", poll_interval=0.25, # seconds between polls wait_timeout=60.0, # max seconds to wait ) # trace.review_status == "reviewed" # trace.created_memory_id is set ``` ## Reviews ### Inline reviews Include the review when creating the trace. This is the simplest path - one call does everything. ```python theme={null} # Context manager with client.trace("...") as ctx: response = my_agent(ctx.augmented_task) ctx.set_output(trajectory=messages, result="pass") # Decorator @reflect_trace(client, task=lambda q: q) def answer(ctx, question): ... return TraceResult(output=response, trajectory=messages, result="pass") # Explicit client.create_trace(task="...", trajectory=[...], review_result="pass") ``` The SDK accepts `"success"` / `"failure"` as aliases for `"pass"` / `"fail"`. ### Deferred reviews Create the trace without a review, then submit one later. This is useful when: * A human needs to evaluate the answer * You're running a batch and want to review traces in one go afterwards * The review depends on external feedback that isn't available yet ```python theme={null} # Step 1: Create trace without a review with client.trace("Summarize the quarterly report") as ctx: response = my_agent(ctx.augmented_task) ctx.set_output(trajectory=messages) # no result - review deferred # ctx.trace_id is available after the with block exits print(f"Trace submitted: {ctx.trace_id}") # Step 2: Later, after human evaluation trace = client.review_trace( trace_id=ctx.trace_id, result="fail", feedback_text="Summary missed the revenue decline in Q3", ) # trace.review_status == "reviewed" # trace.created_memory_id is now set ``` Deferred reviews are processed **synchronously** - the returned `Trace` includes the review and the created memory ID. You can also review traces from the Reflect Console. ### What makes good feedback When a trace fails, `feedback_text` is included in the reflection prompt. Specific feedback produces better reflections: | Feedback | Quality | | ------------------------------------------------------------------------------------ | -------------------------------------------------------------------------------------------------- | | `"Wrong"` | Too vague - the reflection won't capture the specific mistake | | `"The answer was incorrect"` | Slightly better but still generic | | `"Missed the WHERE clause in the SQL query, returning all rows instead of filtered"` | Specific - the reflection will mention the WHERE clause, making it useful for future similar tasks | For passing traces, feedback is optional. The trajectory itself provides enough context for the reflection. ## Listing and fetching traces ```python theme={null} # List traces by review status pending = client.list_traces(review_status="pending") reviewed = client.list_traces(review_status="reviewed") all_traces = client.list_traces() # Fetch a specific trace trace = client.get_trace(trace_id="abc-123") ``` Each `Trace` object includes: | Field | Description | | ------------------- | -------------------------------------------------------- | | `id` | Unique trace identifier | | `task` | The task that was executed | | `trajectory` | List of message dicts | | `review_status` | `"pending"` or `"reviewed"` | | `ingest_status` | `"queued"`, `"processing"`, `"completed"`, or `"failed"` | | `created_memory_id` | ID of the reflection memory (set after review) | | `review` | Attached `Review` object (if reviewed) | ## Choosing a pattern **`client.trace()`** — Auto-tracks `retrieved_memory_ids`. Flexible - inspect output before deciding the review. Supports blocking mode for eval loops. Best for multi-step agents, streaming, conditional review logic. **`@reflect_trace`** — Least boilerplate. Wraps a single function. Supports sync and async. Return `TraceResult` for full control or a string for quick prototyping. Best for single-function agents and clean codebases. **`create_trace` / `create_trace_and_wait`** — Full control over every step. Must pass `retrieved_memory_ids` manually. Best for existing codebases, batch pipelines, and cases where traces are created separately from memory retrieval. # Introduction Source: https://docs.starlight-search.com/index Python client for the Reflect learning API. Most AI agents have no memory of what worked and what didn't. Every run starts from scratch - the same mistakes get made, the same dead ends get explored, and hard-won knowledge from previous tasks disappears the moment a session ends. **Reflect gives your agents a long-term memory that gets smarter over time.** When a task completes, Reflect records the full trajectory - every tool call, every decision, the final response, and whether the outcome was a success or failure. From that record it generates a concise reflection: what worked, what went wrong, and what to do differently next time. That reflection is stored as a memory with an initial utility score. The next time a similar task arrives, Reflect retrieves the most relevant memories and ranks them not just by semantic similarity, but by how useful they have proven in practice. Memories that consistently led to good outcomes rise to the top. Memories associated with failures are deprioritised until they are rehabilitated by a successful run. Those ranked memories are injected into the agent's prompt before it runs - so the agent starts each task with the distilled experience of every previous run, not a blank slate. ## Works across workflows Use Reflect with any agent that can call the API. It is not tied to a single model, framework, or agent runtime. Plug Reflect into your existing setup, whether you run custom scripts, evaluators, agent loops, or lightweight MCP-based tooling. Store and retrieve memories for debugging, implementation, documentation, testing, refactoring, and other kinds of work. Memories created in one workflow can help with another. A reflection from a coding task can still be useful later in testing, docs, or review work when the task is relevant. ## The learning loop Reflect learning loop diagram Before executing a task, retrieve relevant reflections from past runs. Memories are ranked by learned utility. Append retrieved memories to the task text. The SDK formats successful and failed reflections into sections your LLM can use as context. Your agent generates a response using the memory-augmented prompt. Store the full trajectory - task, steps, final response, and which memories were used. Mark the outcome as pass or fail. The API generates a new reflection memory and adjusts utility scores of the memories that were retrieved. Future queries automatically favor reflections that led to success. ## Worked example The following walks through a complete cycle. The agent is a customer support bot that handles refund requests. ### Step 1 - Query and rerank memories A customer writes in: *"I ordered the wrong size and want to return it. Order #8821."* Before the agent responds, Reflect fetches candidate memories from past support interactions and reranks them by blended score: ``` score = (1 - λ) × similarity + λ × q_value ``` With `λ = 0.5`, utility and semantic relevance contribute equally. A memory that consistently led to `pass` outcomes floats to the top even if it is not the closest semantic match. | Memory | Similarity | utility | Score | | ----------------------------------------------------------------- | :--------: | :-----: | :---: | | Always confirm the order number exists before processing a return | 0.91 | 0.82 | 0.865 | | Ask the customer to select a reason code before issuing a refund | 0.76 | 0.90 | 0.830 | | Offer an exchange first - customers often prefer it over a refund | 0.88 | 0.74 | 0.810 | Reranked order: \[1] confirm order \[2] ask reason code \[3] offer exchange ### Step 2 - Augment the task prompt Reflect injects the top memories into the task before it reaches the agent: ``` Task (augmented): Handle a return request for order #8821 - customer ordered the wrong size. --- Reflect Memories --- [PASS] Always look up the order number in the system before taking any action. Proceeding without verifying led to refunds on non-existent orders. [PASS] Ask the customer to confirm a return reason before issuing a refund. Missing reason codes caused accounting reconciliation failures. [PASS] Offer an exchange as the first option before escalating to a full refund. Many customers prefer a replacement, which reduces refund costs. --- End Memories --- ``` ### Step 3 - Agent runs and produces a trajectory The agent receives the augmented prompt, calls tools, and replies to the customer. A trajectory is the full record of that execution: "I ordered the wrong size and want to return it. Order #8821." Agent calls `lookup_order("8821")` to verify the order before taking any action. **Result:** `{ item: "Running Shoes", size: "US 9", status: "delivered" }` Agent calls `send_message` to contact the customer. **Message sent:** "Hi Jamie - I can see order #8821 for Running Shoes (US 9). Would you prefer an exchange for a different size, or a full refund? Could you also confirm the reason for the return?" **Result:** `{ status: "sent" }` "I've verified your order and sent you an email to confirm your preference and return reason." ### Step 4a - Pass: reflection stored as a new memory The agent verified the order, collected a reason, and offered an exchange first. The support team marks the outcome `pass`. Reflect generates a reflection from the trajectory and stores it as a new memory: ``` [NEW MEMORY - q_value: 0.5 (initial)] When handling a return or refund request: 1. Always look up the order in the system first before taking any action or making promises - confirms the order is real and still eligible. 2. Ask the customer for a return reason before initiating anything - required for accounting and policy compliance. 3. Offer an exchange before a refund - many customers accept it and it reduces net refund volume. 4. Confirm the preferred resolution (exchange vs. refund) in the same message to avoid an unnecessary back-and-forth round trip. ``` The utility scores of the three retrieved memories are updated upward (reward = 1.0): ``` q_new = q_old + α × (reward − q_old) where α = 0.1, reward = 1.0 mem_a1b2 "Always confirm the order number first" 0.82 → 0.838 mem_c3d4 "Ask for reason code before refund" 0.90 → 0.910 mem_e5f6 "Offer exchange before refund" 0.74 → 0.766 ``` ### Step 4b - Fail: reflection stored as a new memory Now consider an earlier run, before these memories existed. The agent skipped the lookup and replied immediately: > *"No problem! I've gone ahead and issued a full refund for order #8821. You'll see it in 3–5 business days."* The order did not exist in the system - it had already been cancelled and refunded. The support team marks this `fail` with feedback: `"Refund issued on a cancelled order - no order lookup was performed"`. Reflect generates a reflection from the failed trajectory and stores it as a new memory: ``` [NEW MEMORY - q_value: 0.5 (initial)] ## Incorrect assumptions - Assumed the order number provided by the customer is valid without verifying it in the system first. - Assumed a refund was the right resolution without asking the customer for their preference or a return reason. ## Steps to improve - Always call lookup_order before taking any action on a return request. - Collect the return reason and confirm the customer's preferred resolution (exchange or refund) before proceeding. ## What to avoid next time - Never issue a refund or exchange in the same turn as the initial request without first verifying the order exists and is eligible. - Avoid assuming a full refund is desired - an exchange is often preferred and avoids unnecessary financial transactions. ``` The utility scores of any memories retrieved during that failed run are updated downward (reward = 0.0), so they surface less often until a successful run rehabilitates them. *** Future support requests about returns will now retrieve these reflections, and the agent avoids the same mistakes automatically. ## Use cases by industry Agents that write, review, or debug code learn which patterns led to passing tests and which caused regressions. A reflection from a failed code review surfaces automatically the next time a similar change is proposed. Support agents learn from resolved tickets - what tone worked, which escalation paths succeeded, and which assumptions caused incorrect refunds or missed SLAs. Each outcome improves the next interaction. Agents diagnosing equipment faults or generating maintenance plans retrieve memories from past incidents. A root-cause finding from a previous machine failure informs the response to a new one with similar symptoms. Clinical decision-support agents retrieve prior case reflections when evaluating similar presentations. Memories from cases where a recommendation was later revised carry lower utility scores and are deprioritised automatically. Contract review or compliance agents learn which clause interpretations were accepted by counsel and which were flagged. Accepted patterns are reinforced; rejected ones are down-ranked over time. Agents generating investment summaries or risk assessments learn which analyses were signed off and which were sent back for revision. Approved reasoning patterns resurface on similar instruments. ## Next steps Install the SDK with pip. Create a client, query memories, and record a trace. Query and augment tasks with past reflections. Full ReflectClient method reference. # Installation Source: https://docs.starlight-search.com/installation Install the Reflect SDK for Python. ```bash theme={null} pip install reflect-sdk ``` ## Verify installation ```python theme={null} from reflect_sdk import ReflectClient client = ReflectClient( api_key="rf_live_...", project_id="my-project", ) print(client.health()) # {"status": "ok"} ``` # Quickstart Source: https://docs.starlight-search.com/quickstart Create a client, query memories, record a trace, and submit a review. This guide walks through the full learning loop: query memories, augment a prompt, record a trace, and submit a review. You need an API key and project ID from the [Reflect console](https://reflect.starlight-search.com). ```python theme={null} from reflect_sdk import ReflectClient client = ReflectClient( base_url="https://api.starlight-search.com", api_key="rf_live_...", project_id="my-project", ) ``` The `client.trace()` context manager handles the full loop in one block - it queries memories on entry, and auto-submits the trace with the correct `retrieved_memory_ids` on exit: ```python theme={null} with client.trace("How do I implement retry logic with exponential backoff?") as ctx: # ctx.augmented_task contains the task + any relevant memory blocks # ctx.memories contains the retrieved Memory objects response = my_agent(ctx.augmented_task) ctx.set_output( trajectory=[ {"role": "user", "content": ctx.augmented_task}, {"role": "assistant", "content": response}, ], result="pass", ) # Trace auto-submitted with retrieved_memory_ids tracked for you ``` If no memories exist yet, `ctx.augmented_task` returns the original task unchanged. The generated reflection now appears in future queries: ```python theme={null} memories = client.query_memories( task="What is the best approach for retrying failed requests?", limit=5, ) for m in memories: print(f"{m.reflection} (score: {m.score:.2f})") ``` Feedback text is typically attached by judge workflows or through the platform review UI. SDK/API calls usually submit only the review result (`pass` or `fail`). See [Traces and reviews](/guides/traces-and-reviews) for decorator and explicit call patterns.