Anything Engine · Apr 29 Prep

Apr 29 — Mark Sync Hub

Single screen-share page for the top of the call. Every other doc links from here.

Apr 29 Mark Sync Hub

Apr 29 Mark Sync — Every Doc Links From Here

Single screen-share page for the top of the call. All other docs are linked from here.

Apr 29 morning updates from Mark — logged

Haiku 4.5 is the right tier for the interviewer turn (Slack 1:25 AM). Sub-500ms TTFT, 80–120 tps, $1/$5 per 1M. Opus 4.5/4.6 is the wrong tier. See model selection table in Anything Engine spec.
- Status: directive logged, swap not yet executed. Open conversation: full swap to Haiku 4.5 vs hybrid (Haiku for classify/extract, OpenRouter for resilient WHY)?
All-in on ScaNN, not HNSW (Slack 5:53 AM). Settled. Affects only the matching/scoring milestone. Existing ScaNN one-pager reflects this. Mintlify doc has full rationale.
- Status: nothing to remove from our docs; we never had HNSW hedging. Schema draft already uses ScaNN.

Mintlify reconciliation — divergences from our build (read before the call)

After connecting orbiterio.mintlify.io/mcp and reading /guides/open-work/suggestion-core-concepts/find-investors/* and /guides/open-work/vectors-alloydb/scann.mdx, here is the gap between our sandbox docs and Mark's canonical. None of these block the demo — they shape the AlloyDB migration conversation.

Topic	Our sandbox	Mark's canonical (Mintlify)	What to do
Schema shape	One investors table with 6 vector cols	Three tables: `investment_thesis` (entity, structured filters), `investment_thesis_narrative` (12 rows per investor — 6 dims × 2 sources), `investment_thesis_synthesis` (declared_summary, derived_summary, drift signals)	Rewrite `alloydb-schema.md` to match the 3-table split
Vector dimensions	6 (sector, stage, check, geography, signal, founder)	6 narrative dimensions × 2 sources (declared / derived) = 12 per investor. Dims are: founder_fit, problem_market, competitive_moat, traction_momentum, business_model, expansion_roadmap	Adopt Mark's dimension names
Embedding type	`vector(1536)` from `text-embedding-3-small` (OpenAI)	`halfvec(1536)` from `gemini-embedding-001` via Vertex AI (Matryoshka — can truncate to 768 later without re-embedding). Cheaper, IAM auth, no egress	Swap embedding source on AlloyDB cutover
Thesis-extract LLM	n/a (we just embed the query)	`deepseek/deepseek-v3.2` ($0.252 / $0.378 per 1M) via OpenRouter, ~$0.005/investor, 30–90s per call. Two LLM calls — declared (from bio) + derived (from 100 actual deals). Repair-JSON lambda fixes Roman numerals + stray commas	Note: when we add the thesis-extract step, use Mark's pattern. Haiku 4.5 may be wrong tier — DeepSeek wins on cost+JSON fidelity for this call
Filter-extraction LLM	n/a (no query splitting)	DeepSeek V4-Flash or Gemini 2.5 Flash (Mintlify recommends these); Haiku 4.5 is best tool-use reliability but pricier. Slack message highlighted Haiku-vs-Opus, Mintlify recommends cheaper-than-Haiku for the routing decision specifically	Open question: what's the right model for the routing/filter-extraction step today?
Query pattern	Vector search over Person/Entity, then 3-hop graph for context	Split natural-language query into hard filters + semantic query. Hard filters (whitelisted!) → SQL WHERE. Semantic → embed → ORDER BY vector. Single round trip via AlloyDB inline filtering	We need to add the filter-extraction step before our find_investors run on AlloyDB
14 outcome list	find_investors, find_talent, find_customers, research_person, research_company, research_topic, find_partners, find_advisors, find_co_investors, find_journalists, find_event_attendees, find_warm_intros, summarize_meeting, plan_outcome	find-investors, find-investment-opportunities, find-cofounder, find-collaborators, find-acquisition-target, find-job, find-media-pr, find-mentor-advisor, find-speakers, get-advice, hire-key-talent, make-a-purchase, prospect-customers-clients, strategic-partnerships	Reconcile in the call. Ours skews "search verbs"; Mark's skews "outcome types". `find_talent` ↔ `hire-key-talent` is fine; `summarize_meeting` is missing from Mark's list; `find-cofounder` is missing from ours
Production thesis pipeline	n/a in sandbox	Already running in Xano workspace 3: table 709 (`investment_theses`), function 12911 (`thesis/gather-investor-context-v3`), function 12916 (`thesis/build-investment-thesis-v21`). 54 columns. JSON-strict prompts active. ISO 3166-1 alpha-2 geo codes	Don't rebuild this — call it. Sandbox find_investors should integrate with table 709 over time, not invent a parallel store
JSON repair	Generic fence stripping	Mark's `repairJson` lambda fixes two specific DeepSeek glitches: Roman numerals (`III` → `3`) and stray colon-comma (`":,` → `":`)	Add to our prompts if we move to DeepSeek

Implications for the call:

The find_investors AlloyDB migration is more substantial than our current schema doc suggests — it's a 3-table refactor with 12 narratives/investor, not 6 vectors on a single row.
The thesis-extract step (declared + derived passes against 100 deals per investor) doesn't exist in our sandbox. That's the core IP — we should be wiring our find-investors tool to call the existing Xano build-investment-thesis-v21 (function 12916), not duplicating it.
The filter-extraction step is the missing link in our query path. With it: a natural-language query splits cleanly into the SQL WHERE + ORDER BY shape Mark expects. Without it: we lose the speed and precision ScaNN was chosen for.
Mark's outcome taxonomy differs from ours by ~6 names. This is a 5-minute alignment conversation — don't sweat it, just reconcile in the call.

Live demo

https://orbiter-sandbox.vercel.app/find-investors

First thing to try: type "find me investors for a Series A medtech round, $5-10M check, US-based" against the pre-seeded demo thread (demo-anything-engine, user demo-robert). Then ask "who else should I talk to" as turn 2 to watch memory steer the classifier.

What's live

Component	Status	Where	Notes
Sandbox UI	live	Vercel `orbiter-sandbox.vercel.app`	Crayon templates: scanning_card, contact_card, error_message, loading_indicator
BFF route	live	`src/app/api/find-investors/route.ts`	Thin pass-through to Xano, SSE pipe
Dispatch endpoint	live	Xano 8399 `/anything-engine/dispatch`	Front door: Zep fetch → classify → branch → Zep ingest
Classifier	live	Xano 8400 `/anything-engine/classify`	Anthropic Haiku 4.5 primary (updated Apr 29 → Haiku 4.5 primary); OpenRouter Llama 3.3 70B (Fireworks → Together) as resilience fallback. Accepts `context` arg.
find_investors tool	live	Xano 8401 `/anything-engine/find-investors`	Embed → FalkorDB Cypher (VC_Firm + Angel + 3-hop portfolio/co-inv/board) → WHY pass → contact cards
find_talent tool	live	Xano 8402 `/anything-engine/find-talent`	Role extraction → title-match Cypher (Person + C_Suite + colleagues) → deterministic candidate cards (LLM synth replaced with foreach after crashes)
Zep memory	live	Zep Cloud "Demo Project"	Auth verified, demo thread pre-seeded, 3-turn loop verified
Memory steering	verified	dispatch 8399	Same query yields different class with vs without thread context
Banned-phrase regex (draft)	drafted	docs/test-harness.md	Apr 21 LSI list — ride shotgun, tee up, lock the, playbook, nine-figure, etc.
WorkOS auth	scaffolded	`src/start.ts`	Wired but not gating prod yet

What's pending

Item	Owner	Blocking on
AlloyDB connection from Xano	Mark	Mark spinning up cluster + exposing creds
AlloyDB schema reconciliation	Mark + Robert	Live merge of `docs/alloydb-schema.md` (Robert) with Mark's draft
Backfill job (FalkorDB → AlloyDB, 6 vectors)	Mark	AlloyDB cluster
Other 12 tool branches	Robert	Mark locks the 14-class list first
Test harness (Node script)	Robert	This call — fixture storage + run mode decisions
Production WorkOS auth on sandbox	Robert	Charles's existing pattern, need to port
Vercel preview-deploy gating	Robert	Gate previews behind WorkOS or skip for now
Real users	Mark	post-AlloyDB
Crayon master skill file	Robert	drafted at `skills/crayon/SKILL.md` (this call)
Banned-relationship Zep facts	Robert + Mark	Where this list lives — Zep, Xano table, or user pref?

Pre-call reading

architecture.md — full pipeline, stack decision, milestones, live endpoint table
anything-engine.md — 14-class spec, why it beats the 6-tool router
find-investors-edge-cases.md — 24 edge cases across input / filter / synthesis / pipeline / UI
alloydb-schema.md — Robert's parallel schema draft, 7 open questions for Mark
alloydb-scann.md — ScaNN reference notes
zep-memory.md — Zep one-pager, auth gotcha, integration shape, 6 open questions
test-harness.md — fixture-driven replay rig, 7 open questions

Open questions for the call

Pulled from each doc. Top of the list:

Lock the 14-class list (anything-engine.md) — Mark's Mintlify is canonical, need the freeze.
AlloyDB schema reconcile (alloydb-schema.md Q1-7) — vector dim, firms-as-rows-or-table, geographies as ISO vs polygon, authority score ownership.
Test harness fixture storage + CI gate (test-harness.md Q1-3) — JSON in repo + Xano table + sync? Gate or warn-only?
Zep thread granularity (zep-memory.md Q1-2) — one per user lifelong vs per-session. My vote: per user.
Dual-write Zep → AlloyDB (zep-memory.md Q3) — your Apr 28 "redundant data is an advantage" position. Resolve or park?
Edge case #1 ambiguous intent (find-investors-edge-cases.md) — classifier asks back, or UI surfaces picker?
Banned-relationship list (find-investors-edge-cases.md #14, zep-memory.md Q6) — Zep fact, Xano table, or user pref?
Prompt storage (architecture.md "what lives where") — Xano text field vs GitHub raw URL fetch.

Working cadence reminder

Daily 10:30 sessions all week (Apr 28 → May 2). Caitlin bumps if VC raise activity.
End-of-week target: find_investors flow working end-to-end on the sandbox against AlloyDB.
Skill files + Mintlify = lingua franca. Robert documents however; Mark ports to Mintlify.

All docs

Open any of the seven sub-pages below.

Anything Engine · Architecture

Apr 29 — Mark Sync Hub

Apr 29 morning updates from Mark — logged

Mintlify reconciliation — divergences from our build (read before the call)

Live demo

What's live

What's pending

Pre-call reading

Open questions for the call

Working cadence reminder

All docs

Architecture

Anything Engine

Playing With Prompts

Edge Cases

AlloyDB Schema

AlloyDB ScaNN

Zep Memory

Test Harness

Deployment

Context Pipeline