DocsZep Memory
Memory · Zep

Zep Memory

One-pager on Zep — why it beat Mem0/Cognee, integration shape, open questions.

Zep Memory

Temporal Memory — Facts Accumulating Over Time

Mark, you picked Zep over Mem0 / Cognee on Apr 28. This page captures the integration crisply: what Zep is, why it beat the alternatives, the API shapes we actually call, where it slots in our stack, and what is verified end-to-end as of Apr 28 PM.

What Zep is

Zep is managed temporal user memory. You give it user messages over time; it auto-extracts facts and entities into a per-user knowledge graph and exposes a single retrieval call that returns a pre-formatted "Context Block" you inject into the next prompt. Facts have invalidation timestamps — when a newer fact contradicts an older one, the old one stops surfacing.

Two retrieval modes:

ModeEndpointWhen to use
Context block (high-level)GET /api/v2/threads/{id}/context90% of the time — drop the returned string into your system prompt
Graph search (low-level)POST /api/v2/graph/searchWhen you need entity-specific retrieval, not general thread context

We are using only the context-block mode in dispatch right now. Graph search is an escape hatch we have not needed.

Why it beat the alternatives

Per your Apr 28 evaluation:

  • Plug-and-play — does not touch the FalkorDB graph or the future AlloyDB tables. Lives off to the side.
  • SOC2 compliant from day one — non-negotiable for production.
  • Temporal model — facts have valid-from / invalid-from timestamps. Matters when "Caitlin works at BrainSpace" becomes "Caitlin works at TVM Capital" three months later.
  • Open-source escape hatch — Graphiti is the underlying engine. If we outgrow Zep Cloud, we self-host Graphiti and keep most of the integration.

Mem0 was second choice — also plug-and-play but weaker on temporal invalidation. Cognee felt research-y, not production.

Auth shape (gotcha)

This burned an hour on Apr 28 — flagging here so nobody re-burns it:

Authorization: Api-Key <key>

NOT Bearer. NOT raw. The literal word Api-Key with a space.

Key shape: z_eyJ..., JWT-style, ~152 chars. The eyJ prefix matters. A copy-paste lost it once and produced 401 across every other auth scheme we tried.

Verified Apr 28: GET /api/v2/projects/info with the correct header returns 200.

The API surface we use

Four endpoints, all called server-side from Xano:

# 1. Auth check / smoke test
curl -H "Authorization: Api-Key $ZEP" \
  https://api.getzep.com/api/v2/projects/info

# 2. Create user (idempotent if user_id already exists)
curl -X POST -H "Authorization: Api-Key $ZEP" \
  -H "Content-Type: application/json" \
  -d '{"user_id":"workos-user-abc","email":"x@y.com"}' \
  https://api.getzep.com/api/v2/users

# 3. Create thread
curl -X POST -H "Authorization: Api-Key $ZEP" \
  -H "Content-Type: application/json" \
  -d '{"thread_id":"demo-anything-engine","user_id":"workos-user-abc"}' \
  https://api.getzep.com/api/v2/threads

# 4. Fetch context block (the core retrieval call)
curl -H "Authorization: Api-Key $ZEP" \
  https://api.getzep.com/api/v2/threads/demo-anything-engine/context
# → { "context": "User is raising Series A in medtech. Recently met Caitlin Morse..." }

# 5. Ingest user + assistant messages (post-dispatch)
curl -X POST -H "Authorization: Api-Key $ZEP" \
  -H "Content-Type: application/json" \
  -d '{"messages":[
        {"role":"user","content":"find me investors"},
        {"role":"assistant","content":"Returned 12 Series A medtech investors..."}
      ]}' \
  https://api.getzep.com/api/v2/threads/demo-anything-engine/messages

Empty thread on /context returns {}. Populated thread returns { context: "...", ... }. We check $response.result.context != "" before injecting.

Where it lives in our stack

┌─ Browser (Next.js sandbox) ──────────────────────────┐
│  Posts to /api/find-investors with WorkOS bearer    │
└──────────────────┬───────────────────────────────────┘
                   │ (browser never touches Zep directly)
┌──────────────────▼───────────────────────────────────┐
│  Xano dispatch (endpoint 8399)                       │
│                                                      │
│  STEP 1 — fetch context                              │
│    GET /api/v2/threads/{id}/context                  │
│    → inject into classifier system prompt            │
│                                                      │
│  STEP 2 — classify (8400)                            │
│    OpenRouter Llama 3.3 70B + Fireworks fallback    │
│                                                      │
│  STEP 3 — branch to tool (e.g. 8401 find_investors)  │
│                                                      │
│  STEP 4 — ingest                                     │
│    POST /api/v2/threads/{id}/messages                │
│    body: user query + classification result          │
│    (async — non-blocking on the response stream)     │
└──────────────────────────────────────────────────────┘

Every Zep call originates server-side. The sandbox + WorkOS-issued bearer never touches the Zep API. The Zep API key lives in the Xano workspace env as zep (per the convention in ~/.claude/projects/.../memory/zep_integration.md).

Verified end-to-end (Apr 28 PM)

  • Free-tier signup against project "Demo Project" (7d56a43e-336f-4a75-a587-0ca31de3e787).
  • Demo thread pre-seeded: demo-anything-engine under user demo-robert. Loaded with Series A medtech context + AI infra CTO context.
  • 3-turn memory loop verified: turn 1 "find me investors" → find_investors. Turn 2 "who else should I talk to" with prior thread context → classifier returned find_warm_intros (different class, same literal query in turn 3 against an empty thread classified back to find_investors).
  • Dispatch response includes mem_used and mem_ingested flags so we can see in the wire whether memory steered a given run.

Cost / latency observations

  • Free tier handles the demo and the harness comfortably. Production cost TBD — Zep prices on monthly active users + storage.
  • Context fetch latency: ~150-300ms p50. Acceptable for an in-line dispatch step.
  • Indexing latency: ~4 seconds from messages POST to facts appearing in /context. So if a user hammers two queries inside 4s, the second one will not see the first in context. Probably fine; we will measure.
  • Ingest is async — we do not block the SSE stream waiting for it.

What is left

  • Pre-seeded demo thread for the Apr 29 call screen-share — done (demo-anything-engine).
  • Production thread strategy — open question (see below).
  • Templates feature — Zep supports custom context-block templates. Not using it yet; defaults are fine.
  • Graph search — escape hatch, untouched.
  • Bidirectional sync to AlloyDB — Mark's "redundant data is an advantage" idea. Not yet wired. Open question whether we ever wire it.

Open questions for Apr 29

  1. Thread granularity. One thread per WorkOS user (lifelong), one per session, or one per outcome? My vote: one per user, lifelong. Sessions are a UI concept; memory should follow the user.
  2. Production thread_id format. Deterministic from WorkOS user_id (thread_${workos_user_id}) or random per session? Deterministic is simpler; matches "one thread per user."
  3. AlloyDB dual-write. Your Apr 28 worldview was "redundant data is an advantage — same data, two query paths." My pushback was "exactly until you know which one you want." Is this still a parking-lot item, or do we resolve it Apr 29?
  4. Fact invalidation sensitivity. Zep auto-invalidates contradicted facts. Do we ever want to override that — e.g., user explicitly says "ignore everything before today"?
  5. Ingest scope. Do we ingest just user queries + classifications, or full WHY paragraphs too? Trade-off is signal vs cost.
  6. Banned-relationship list (edge case #14). Does this live as Zep facts ("user has banned firm X") or as a separate Xano user-prefs table?

References

  • Verified integration shape with all auth gotchas: ~/.claude/projects/-Users-robertboulos-Projects-web-apps-orbiter-frontend/memory/zep_integration.md
  • Architecture: architecture.md
  • Apr 28 sync notes: ~/.claude/projects/.../memory/april-28-mark-sync.md section 6
  • Zep concepts: https://help.getzep.com/concepts
  • Zep API ref: https://help.getzep.com/api-reference
  • Docs MCP server: https://docs-mcp.getzep.com/mcp (HTTP transport, tool mcp__zep-docs__search_documentation)