DocsApr 29 · Hub
Anything Engine · Apr 29 Prep

Apr 29 — Mark Sync Hub

Single screen-share page for the top of the call. Every other doc links from here.

Apr 29 Mark Sync Hub

Apr 29 Mark Sync — Every Doc Links From Here

Single screen-share page for the top of the call. All other docs are linked from here.

Apr 29 morning updates from Mark — logged

  1. Haiku 4.5 is the right tier for the interviewer turn (Slack 1:25 AM). Sub-500ms TTFT, 80–120 tps, $1/$5 per 1M. Opus 4.5/4.6 is the wrong tier. See model selection table in Anything Engine spec.
    • Status: directive logged, swap not yet executed. Open conversation: full swap to Haiku 4.5 vs hybrid (Haiku for classify/extract, OpenRouter for resilient WHY)?
  2. All-in on ScaNN, not HNSW (Slack 5:53 AM). Settled. Affects only the matching/scoring milestone. Existing ScaNN one-pager reflects this. Mintlify doc has full rationale.
    • Status: nothing to remove from our docs; we never had HNSW hedging. Schema draft already uses ScaNN.

Mintlify reconciliation — divergences from our build (read before the call)

After connecting orbiterio.mintlify.io/mcp and reading /guides/open-work/suggestion-core-concepts/find-investors/* and /guides/open-work/vectors-alloydb/scann.mdx, here is the gap between our sandbox docs and Mark's canonical. None of these block the demo — they shape the AlloyDB migration conversation.

TopicOur sandboxMark's canonical (Mintlify)What to do
Schema shapeOne investors table with 6 vector colsThree tables: investment_thesis (entity, structured filters), investment_thesis_narrative (12 rows per investor — 6 dims × 2 sources), investment_thesis_synthesis (declared_summary, derived_summary, drift signals)Rewrite alloydb-schema.md to match the 3-table split
Vector dimensions6 (sector, stage, check, geography, signal, founder)6 narrative dimensions × 2 sources (declared / derived) = 12 per investor. Dims are: founder_fit, problem_market, competitive_moat, traction_momentum, business_model, expansion_roadmapAdopt Mark's dimension names
Embedding typevector(1536) from text-embedding-3-small (OpenAI)halfvec(1536) from gemini-embedding-001 via Vertex AI (Matryoshka — can truncate to 768 later without re-embedding). Cheaper, IAM auth, no egressSwap embedding source on AlloyDB cutover
Thesis-extract LLMn/a (we just embed the query)deepseek/deepseek-v3.2 ($0.252 / $0.378 per 1M) via OpenRouter, ~$0.005/investor, 30–90s per call. Two LLM calls — declared (from bio) + derived (from 100 actual deals). Repair-JSON lambda fixes Roman numerals + stray commasNote: when we add the thesis-extract step, use Mark's pattern. Haiku 4.5 may be wrong tier — DeepSeek wins on cost+JSON fidelity for this call
Filter-extraction LLMn/a (no query splitting)DeepSeek V4-Flash or Gemini 2.5 Flash (Mintlify recommends these); Haiku 4.5 is best tool-use reliability but pricier. Slack message highlighted Haiku-vs-Opus, Mintlify recommends cheaper-than-Haiku for the routing decision specificallyOpen question: what's the right model for the routing/filter-extraction step today?
Query patternVector search over Person/Entity, then 3-hop graph for contextSplit natural-language query into hard filters + semantic query. Hard filters (whitelisted!) → SQL WHERE. Semantic → embed → ORDER BY vector. Single round trip via AlloyDB inline filteringWe need to add the filter-extraction step before our find_investors run on AlloyDB
14 outcome listfind_investors, find_talent, find_customers, research_person, research_company, research_topic, find_partners, find_advisors, find_co_investors, find_journalists, find_event_attendees, find_warm_intros, summarize_meeting, plan_outcomefind-investors, find-investment-opportunities, find-cofounder, find-collaborators, find-acquisition-target, find-job, find-media-pr, find-mentor-advisor, find-speakers, get-advice, hire-key-talent, make-a-purchase, prospect-customers-clients, strategic-partnershipsReconcile in the call. Ours skews "search verbs"; Mark's skews "outcome types". find_talenthire-key-talent is fine; summarize_meeting is missing from Mark's list; find-cofounder is missing from ours
Production thesis pipelinen/a in sandboxAlready running in Xano workspace 3: table 709 (investment_theses), function 12911 (thesis/gather-investor-context-v3), function 12916 (thesis/build-investment-thesis-v21). 54 columns. JSON-strict prompts active. ISO 3166-1 alpha-2 geo codesDon't rebuild this — call it. Sandbox find_investors should integrate with table 709 over time, not invent a parallel store
JSON repairGeneric fence strippingMark's repairJson lambda fixes two specific DeepSeek glitches: Roman numerals (III3) and stray colon-comma (":,":)Add to our prompts if we move to DeepSeek

Implications for the call:

  • The find_investors AlloyDB migration is more substantial than our current schema doc suggests — it's a 3-table refactor with 12 narratives/investor, not 6 vectors on a single row.
  • The thesis-extract step (declared + derived passes against 100 deals per investor) doesn't exist in our sandbox. That's the core IP — we should be wiring our find-investors tool to call the existing Xano build-investment-thesis-v21 (function 12916), not duplicating it.
  • The filter-extraction step is the missing link in our query path. With it: a natural-language query splits cleanly into the SQL WHERE + ORDER BY shape Mark expects. Without it: we lose the speed and precision ScaNN was chosen for.
  • Mark's outcome taxonomy differs from ours by ~6 names. This is a 5-minute alignment conversation — don't sweat it, just reconcile in the call.

Live demo

https://orbiter-sandbox.vercel.app/find-investors

First thing to try: type "find me investors for a Series A medtech round, $5-10M check, US-based" against the pre-seeded demo thread (demo-anything-engine, user demo-robert). Then ask "who else should I talk to" as turn 2 to watch memory steer the classifier.

What's live

ComponentStatusWhereNotes
Sandbox UIliveVercel orbiter-sandbox.vercel.appCrayon templates: scanning_card, contact_card, error_message, loading_indicator
BFF routelivesrc/app/api/find-investors/route.tsThin pass-through to Xano, SSE pipe
Dispatch endpointliveXano 8399 /anything-engine/dispatchFront door: Zep fetch → classify → branch → Zep ingest
ClassifierliveXano 8400 /anything-engine/classifyOpenRouter Llama 3.3 70B + Fireworks → Together fallback. Accepts context arg.
find_investors toolliveXano 8401 /anything-engine/find-investorsEmbed → FalkorDB Cypher (VC_Firm + Angel + 3-hop portfolio/co-inv/board) → WHY pass → contact cards
find_talent toolliveXano 8402 /anything-engine/find-talentRole extraction → title-match Cypher (Person + C_Suite + colleagues) → deterministic candidate cards (LLM synth replaced with foreach after crashes)
Zep memoryliveZep Cloud "Demo Project"Auth verified, demo thread pre-seeded, 3-turn loop verified
Memory steeringverifieddispatch 8399Same query yields different class with vs without thread context
Banned-phrase regex (draft)drafteddocs/test-harness.mdApr 21 LSI list — ride shotgun, tee up, lock the, playbook, nine-figure, etc.
WorkOS authscaffoldedsrc/start.tsWired but not gating prod yet

What's pending

ItemOwnerBlocking on
AlloyDB connection from XanoMarkMark spinning up cluster + exposing creds
AlloyDB schema reconciliationMark + RobertLive merge of docs/alloydb-schema.md (Robert) with Mark's draft
Backfill job (FalkorDB → AlloyDB, 6 vectors)MarkAlloyDB cluster
Other 12 tool branchesRobertMark locks the 14-class list first
Test harness (Node script)RobertThis call — fixture storage + run mode decisions
Production WorkOS auth on sandboxRobertCharles's existing pattern, need to port
Vercel preview-deploy gatingRobertGate previews behind WorkOS or skip for now
Real usersMarkpost-AlloyDB
Crayon master skill fileRobertdrafted at skills/crayon/SKILL.md (this call)
Banned-relationship Zep factsRobert + MarkWhere this list lives — Zep, Xano table, or user pref?

Pre-call reading

Open questions for the call

Pulled from each doc. Top of the list:

  1. Lock the 14-class list (anything-engine.md) — Mark's Mintlify is canonical, need the freeze.
  2. AlloyDB schema reconcile (alloydb-schema.md Q1-7) — vector dim, firms-as-rows-or-table, geographies as ISO vs polygon, authority score ownership.
  3. Test harness fixture storage + CI gate (test-harness.md Q1-3) — JSON in repo + Xano table + sync? Gate or warn-only?
  4. Zep thread granularity (zep-memory.md Q1-2) — one per user lifelong vs per-session. My vote: per user.
  5. Dual-write Zep → AlloyDB (zep-memory.md Q3) — your Apr 28 "redundant data is an advantage" position. Resolve or park?
  6. Edge case #1 ambiguous intent (find-investors-edge-cases.md) — classifier asks back, or UI surfaces picker?
  7. Banned-relationship list (find-investors-edge-cases.md #14, zep-memory.md Q6) — Zep fact, Xano table, or user pref?
  8. Prompt storage (architecture.md "what lives where") — Xano text field vs GitHub raw URL fetch.

Working cadence reminder

  • Daily 10:30 sessions all week (Apr 28 → May 2). Caitlin bumps if VC raise activity.
  • End-of-week target: find_investors flow working end-to-end on the sandbox against AlloyDB.
  • Skill files + Mintlify = lingua franca. Robert documents however; Mark ports to Mintlify.

All docs

Open any of the seven sub-pages below.