Zep Memory
One-pager on Zep — why it beat Mem0/Cognee, integration shape, open questions.

Temporal Memory — Facts Accumulating Over Time
Mark, you picked Zep over Mem0 / Cognee on Apr 28. This page captures the integration crisply: what Zep is, why it beat the alternatives, the API shapes we actually call, where it slots in our stack, and what is verified end-to-end as of Apr 28 PM.
What Zep is
Zep is managed temporal user memory. You give it user messages over time; it auto-extracts facts and entities into a per-user knowledge graph and exposes a single retrieval call that returns a pre-formatted "Context Block" you inject into the next prompt. Facts have invalidation timestamps — when a newer fact contradicts an older one, the old one stops surfacing.
Two retrieval modes:
| Mode | Endpoint | When to use |
|---|---|---|
| Context block (high-level) | GET /api/v2/threads/{id}/context | 90% of the time — drop the returned string into your system prompt |
| Graph search (low-level) | POST /api/v2/graph/search | When you need entity-specific retrieval, not general thread context |
We are using only the context-block mode in dispatch right now. Graph search is an escape hatch we have not needed.
Why it beat the alternatives
Per your Apr 28 evaluation:
- Plug-and-play — does not touch the FalkorDB graph or the future AlloyDB tables. Lives off to the side.
- SOC2 compliant from day one — non-negotiable for production.
- Temporal model — facts have valid-from / invalid-from timestamps. Matters when "Caitlin works at BrainSpace" becomes "Caitlin works at TVM Capital" three months later.
- Open-source escape hatch — Graphiti is the underlying engine. If we outgrow Zep Cloud, we self-host Graphiti and keep most of the integration.
Mem0 was second choice — also plug-and-play but weaker on temporal invalidation. Cognee felt research-y, not production.
Auth shape (gotcha)
This burned an hour on Apr 28 — flagging here so nobody re-burns it:
Authorization: Api-Key <key>
NOT Bearer. NOT raw. The literal word Api-Key with a space.
Key shape: z_eyJ..., JWT-style, ~152 chars. The eyJ prefix matters. A copy-paste lost it once and produced 401 across every other auth scheme we tried.
Verified Apr 28: GET /api/v2/projects/info with the correct header returns 200.
The API surface we use
Four endpoints, all called server-side from Xano:
# 1. Auth check / smoke test
curl -H "Authorization: Api-Key $ZEP" \
https://api.getzep.com/api/v2/projects/info
# 2. Create user (idempotent if user_id already exists)
curl -X POST -H "Authorization: Api-Key $ZEP" \
-H "Content-Type: application/json" \
-d '{"user_id":"workos-user-abc","email":"x@y.com"}' \
https://api.getzep.com/api/v2/users
# 3. Create thread
curl -X POST -H "Authorization: Api-Key $ZEP" \
-H "Content-Type: application/json" \
-d '{"thread_id":"demo-anything-engine","user_id":"workos-user-abc"}' \
https://api.getzep.com/api/v2/threads
# 4. Fetch context block (the core retrieval call)
curl -H "Authorization: Api-Key $ZEP" \
https://api.getzep.com/api/v2/threads/demo-anything-engine/context
# → { "context": "User is raising Series A in medtech. Recently met Caitlin Morse..." }
# 5. Ingest user + assistant messages (post-dispatch)
curl -X POST -H "Authorization: Api-Key $ZEP" \
-H "Content-Type: application/json" \
-d '{"messages":[
{"role":"user","content":"find me investors"},
{"role":"assistant","content":"Returned 12 Series A medtech investors..."}
]}' \
https://api.getzep.com/api/v2/threads/demo-anything-engine/messages
Empty thread on /context returns {}. Populated thread returns { context: "...", ... }. We check $response.result.context != "" before injecting.
Where it lives in our stack
┌─ Browser (Next.js sandbox) ──────────────────────────┐
│ Posts to /api/find-investors with WorkOS bearer │
└──────────────────┬───────────────────────────────────┘
│ (browser never touches Zep directly)
┌──────────────────▼───────────────────────────────────┐
│ Xano dispatch (endpoint 8399) │
│ │
│ STEP 1 — fetch context │
│ GET /api/v2/threads/{id}/context │
│ → inject into classifier system prompt │
│ │
│ STEP 2 — classify (8400) │
│ OpenRouter Llama 3.3 70B + Fireworks fallback │
│ │
│ STEP 3 — branch to tool (e.g. 8401 find_investors) │
│ │
│ STEP 4 — ingest │
│ POST /api/v2/threads/{id}/messages │
│ body: user query + classification result │
│ (async — non-blocking on the response stream) │
└──────────────────────────────────────────────────────┘
Every Zep call originates server-side. The sandbox + WorkOS-issued bearer never touches the Zep API. The Zep API key lives in the Xano workspace env as zep (per the convention in ~/.claude/projects/.../memory/zep_integration.md).
Verified end-to-end (Apr 28 PM)
- Free-tier signup against project "Demo Project" (
7d56a43e-336f-4a75-a587-0ca31de3e787). - Demo thread pre-seeded:
demo-anything-engineunder userdemo-robert. Loaded with Series A medtech context + AI infra CTO context. - 3-turn memory loop verified: turn 1 "find me investors" →
find_investors. Turn 2 "who else should I talk to" with prior thread context → classifier returnedfind_warm_intros(different class, same literal query in turn 3 against an empty thread classified back tofind_investors). - Dispatch response includes
mem_usedandmem_ingestedflags so we can see in the wire whether memory steered a given run.
Cost / latency observations
- Free tier handles the demo and the harness comfortably. Production cost TBD — Zep prices on monthly active users + storage.
- Context fetch latency: ~150-300ms p50. Acceptable for an in-line dispatch step.
- Indexing latency: ~4 seconds from
messagesPOST to facts appearing in/context. So if a user hammers two queries inside 4s, the second one will not see the first in context. Probably fine; we will measure. - Ingest is async — we do not block the SSE stream waiting for it.
What is left
- Pre-seeded demo thread for the Apr 29 call screen-share — done (
demo-anything-engine). - Production thread strategy — open question (see below).
- Templates feature — Zep supports custom context-block templates. Not using it yet; defaults are fine.
- Graph search — escape hatch, untouched.
- Bidirectional sync to AlloyDB — Mark's "redundant data is an advantage" idea. Not yet wired. Open question whether we ever wire it.
Open questions for Apr 29
- Thread granularity. One thread per WorkOS user (lifelong), one per session, or one per outcome? My vote: one per user, lifelong. Sessions are a UI concept; memory should follow the user.
- Production thread_id format. Deterministic from WorkOS user_id (
thread_${workos_user_id}) or random per session? Deterministic is simpler; matches "one thread per user." - AlloyDB dual-write. Your Apr 28 worldview was "redundant data is an advantage — same data, two query paths." My pushback was "exactly until you know which one you want." Is this still a parking-lot item, or do we resolve it Apr 29?
- Fact invalidation sensitivity. Zep auto-invalidates contradicted facts. Do we ever want to override that — e.g., user explicitly says "ignore everything before today"?
- Ingest scope. Do we ingest just user queries + classifications, or full WHY paragraphs too? Trade-off is signal vs cost.
- Banned-relationship list (edge case #14). Does this live as Zep facts ("user has banned firm X") or as a separate Xano user-prefs table?
References
- Verified integration shape with all auth gotchas:
~/.claude/projects/-Users-robertboulos-Projects-web-apps-orbiter-frontend/memory/zep_integration.md - Architecture: architecture.md
- Apr 28 sync notes:
~/.claude/projects/.../memory/april-28-mark-sync.mdsection 6 - Zep concepts: https://help.getzep.com/concepts
- Zep API ref: https://help.getzep.com/api-reference
- Docs MCP server:
https://docs-mcp.getzep.com/mcp(HTTP transport, toolmcp__zep-docs__search_documentation)