review_result="pass"orreview_result="fail"- Optional
feedback_textexplaining why the run passed or failed
Option 1: LLM as judge
Use this pattern when you already have a rubric the model can apply consistently, such as exact-answer checks, formatting checks, or lightweight factual grading.Flow
- Retrieve memories and run your agent.
- Ask a second model to judge the result.
- Save the trace with the judge reflection inline.
Option 2: Human review
Use this pattern when correctness depends on domain expertise, the task is subjective, or you want reviewers to inspect the answer before creating a memory.Flow
- Store the trace immediately after the run finishes.
- Show the result in your own review queue or internal tool.
- Submit the human reflection later with
review_trace(...).
Choosing a workflow
| Workflow | Best for |
|---|---|
| LLM judge | Fast automated eval loops, regression checks, rubric-based scoring |
| Human review | Subjective tasks, high-stakes outputs, expert verification |