Apr 29 — Mark Sync Hub
Single screen-share page for the top of the call. Every other doc links from here.

Apr 29 Mark Sync — Every Doc Links From Here
Single screen-share page for the top of the call. All other docs are linked from here.
Apr 29 morning updates from Mark — logged
- Haiku 4.5 is the right tier for the interviewer turn (Slack 1:25 AM). Sub-500ms TTFT, 80–120 tps, $1/$5 per 1M. Opus 4.5/4.6 is the wrong tier. See model selection table in Anything Engine spec.
- Status: directive logged, swap not yet executed. Open conversation: full swap to Haiku 4.5 vs hybrid (Haiku for classify/extract, OpenRouter for resilient WHY)?
- All-in on ScaNN, not HNSW (Slack 5:53 AM). Settled. Affects only the matching/scoring milestone. Existing ScaNN one-pager reflects this. Mintlify doc has full rationale.
- Status: nothing to remove from our docs; we never had HNSW hedging. Schema draft already uses ScaNN.
Mintlify reconciliation — divergences from our build (read before the call)
After connecting orbiterio.mintlify.io/mcp and reading /guides/open-work/suggestion-core-concepts/find-investors/* and /guides/open-work/vectors-alloydb/scann.mdx, here is the gap between our sandbox docs and Mark's canonical. None of these block the demo — they shape the AlloyDB migration conversation.
| Topic | Our sandbox | Mark's canonical (Mintlify) | What to do |
|---|---|---|---|
| Schema shape | One investors table with 6 vector cols | Three tables: investment_thesis (entity, structured filters), investment_thesis_narrative (12 rows per investor — 6 dims × 2 sources), investment_thesis_synthesis (declared_summary, derived_summary, drift signals) | Rewrite alloydb-schema.md to match the 3-table split |
| Vector dimensions | 6 (sector, stage, check, geography, signal, founder) | 6 narrative dimensions × 2 sources (declared / derived) = 12 per investor. Dims are: founder_fit, problem_market, competitive_moat, traction_momentum, business_model, expansion_roadmap | Adopt Mark's dimension names |
| Embedding type | vector(1536) from text-embedding-3-small (OpenAI) | halfvec(1536) from gemini-embedding-001 via Vertex AI (Matryoshka — can truncate to 768 later without re-embedding). Cheaper, IAM auth, no egress | Swap embedding source on AlloyDB cutover |
| Thesis-extract LLM | n/a (we just embed the query) | deepseek/deepseek-v3.2 ($0.252 / $0.378 per 1M) via OpenRouter, ~$0.005/investor, 30–90s per call. Two LLM calls — declared (from bio) + derived (from 100 actual deals). Repair-JSON lambda fixes Roman numerals + stray commas | Note: when we add the thesis-extract step, use Mark's pattern. Haiku 4.5 may be wrong tier — DeepSeek wins on cost+JSON fidelity for this call |
| Filter-extraction LLM | n/a (no query splitting) | DeepSeek V4-Flash or Gemini 2.5 Flash (Mintlify recommends these); Haiku 4.5 is best tool-use reliability but pricier. Slack message highlighted Haiku-vs-Opus, Mintlify recommends cheaper-than-Haiku for the routing decision specifically | Open question: what's the right model for the routing/filter-extraction step today? |
| Query pattern | Vector search over Person/Entity, then 3-hop graph for context | Split natural-language query into hard filters + semantic query. Hard filters (whitelisted!) → SQL WHERE. Semantic → embed → ORDER BY vector. Single round trip via AlloyDB inline filtering | We need to add the filter-extraction step before our find_investors run on AlloyDB |
| 14 outcome list | find_investors, find_talent, find_customers, research_person, research_company, research_topic, find_partners, find_advisors, find_co_investors, find_journalists, find_event_attendees, find_warm_intros, summarize_meeting, plan_outcome | find-investors, find-investment-opportunities, find-cofounder, find-collaborators, find-acquisition-target, find-job, find-media-pr, find-mentor-advisor, find-speakers, get-advice, hire-key-talent, make-a-purchase, prospect-customers-clients, strategic-partnerships | Reconcile in the call. Ours skews "search verbs"; Mark's skews "outcome types". find_talent ↔ hire-key-talent is fine; summarize_meeting is missing from Mark's list; find-cofounder is missing from ours |
| Production thesis pipeline | n/a in sandbox | Already running in Xano workspace 3: table 709 (investment_theses), function 12911 (thesis/gather-investor-context-v3), function 12916 (thesis/build-investment-thesis-v21). 54 columns. JSON-strict prompts active. ISO 3166-1 alpha-2 geo codes | Don't rebuild this — call it. Sandbox find_investors should integrate with table 709 over time, not invent a parallel store |
| JSON repair | Generic fence stripping | Mark's repairJson lambda fixes two specific DeepSeek glitches: Roman numerals (III → 3) and stray colon-comma (":, → ":) | Add to our prompts if we move to DeepSeek |
Implications for the call:
- The find_investors AlloyDB migration is more substantial than our current schema doc suggests — it's a 3-table refactor with 12 narratives/investor, not 6 vectors on a single row.
- The thesis-extract step (declared + derived passes against 100 deals per investor) doesn't exist in our sandbox. That's the core IP — we should be wiring our find-investors tool to call the existing Xano
build-investment-thesis-v21(function 12916), not duplicating it. - The filter-extraction step is the missing link in our query path. With it: a natural-language query splits cleanly into the SQL WHERE + ORDER BY shape Mark expects. Without it: we lose the speed and precision ScaNN was chosen for.
- Mark's outcome taxonomy differs from ours by ~6 names. This is a 5-minute alignment conversation — don't sweat it, just reconcile in the call.
Live demo
https://orbiter-sandbox.vercel.app/find-investors
First thing to try: type "find me investors for a Series A medtech round, $5-10M check, US-based" against the pre-seeded demo thread (demo-anything-engine, user demo-robert). Then ask "who else should I talk to" as turn 2 to watch memory steer the classifier.
What's live
| Component | Status | Where | Notes |
|---|---|---|---|
| Sandbox UI | live | Vercel orbiter-sandbox.vercel.app | Crayon templates: scanning_card, contact_card, error_message, loading_indicator |
| BFF route | live | src/app/api/find-investors/route.ts | Thin pass-through to Xano, SSE pipe |
| Dispatch endpoint | live | Xano 8399 /anything-engine/dispatch | Front door: Zep fetch → classify → branch → Zep ingest |
| Classifier | live | Xano 8400 /anything-engine/classify | OpenRouter Llama 3.3 70B + Fireworks → Together fallback. Accepts context arg. |
| find_investors tool | live | Xano 8401 /anything-engine/find-investors | Embed → FalkorDB Cypher (VC_Firm + Angel + 3-hop portfolio/co-inv/board) → WHY pass → contact cards |
| find_talent tool | live | Xano 8402 /anything-engine/find-talent | Role extraction → title-match Cypher (Person + C_Suite + colleagues) → deterministic candidate cards (LLM synth replaced with foreach after crashes) |
| Zep memory | live | Zep Cloud "Demo Project" | Auth verified, demo thread pre-seeded, 3-turn loop verified |
| Memory steering | verified | dispatch 8399 | Same query yields different class with vs without thread context |
| Banned-phrase regex (draft) | drafted | docs/test-harness.md | Apr 21 LSI list — ride shotgun, tee up, lock the, playbook, nine-figure, etc. |
| WorkOS auth | scaffolded | src/start.ts | Wired but not gating prod yet |
What's pending
| Item | Owner | Blocking on |
|---|---|---|
| AlloyDB connection from Xano | Mark | Mark spinning up cluster + exposing creds |
| AlloyDB schema reconciliation | Mark + Robert | Live merge of docs/alloydb-schema.md (Robert) with Mark's draft |
| Backfill job (FalkorDB → AlloyDB, 6 vectors) | Mark | AlloyDB cluster |
| Other 12 tool branches | Robert | Mark locks the 14-class list first |
| Test harness (Node script) | Robert | This call — fixture storage + run mode decisions |
| Production WorkOS auth on sandbox | Robert | Charles's existing pattern, need to port |
| Vercel preview-deploy gating | Robert | Gate previews behind WorkOS or skip for now |
| Real users | Mark | post-AlloyDB |
| Crayon master skill file | Robert | drafted at skills/crayon/SKILL.md (this call) |
| Banned-relationship Zep facts | Robert + Mark | Where this list lives — Zep, Xano table, or user pref? |
Pre-call reading
- architecture.md — full pipeline, stack decision, milestones, live endpoint table
- anything-engine.md — 14-class spec, why it beats the 6-tool router
- find-investors-edge-cases.md — 24 edge cases across input / filter / synthesis / pipeline / UI
- alloydb-schema.md — Robert's parallel schema draft, 7 open questions for Mark
- alloydb-scann.md — ScaNN reference notes
- zep-memory.md — Zep one-pager, auth gotcha, integration shape, 6 open questions
- test-harness.md — fixture-driven replay rig, 7 open questions
Open questions for the call
Pulled from each doc. Top of the list:
- Lock the 14-class list (anything-engine.md) — Mark's Mintlify is canonical, need the freeze.
- AlloyDB schema reconcile (alloydb-schema.md Q1-7) — vector dim, firms-as-rows-or-table, geographies as ISO vs polygon, authority score ownership.
- Test harness fixture storage + CI gate (test-harness.md Q1-3) — JSON in repo + Xano table + sync? Gate or warn-only?
- Zep thread granularity (zep-memory.md Q1-2) — one per user lifelong vs per-session. My vote: per user.
- Dual-write Zep → AlloyDB (zep-memory.md Q3) — your Apr 28 "redundant data is an advantage" position. Resolve or park?
- Edge case #1 ambiguous intent (find-investors-edge-cases.md) — classifier asks back, or UI surfaces picker?
- Banned-relationship list (find-investors-edge-cases.md #14, zep-memory.md Q6) — Zep fact, Xano table, or user pref?
- Prompt storage (architecture.md "what lives where") — Xano text field vs GitHub raw URL fetch.
Working cadence reminder
- Daily 10:30 sessions all week (Apr 28 → May 2). Caitlin bumps if VC raise activity.
- End-of-week target: find_investors flow working end-to-end on the sandbox against AlloyDB.
- Skill files + Mintlify = lingua franca. Robert documents however; Mark ports to Mintlify.
All docs
Open any of the seven sub-pages below.
Architecture
End-to-end pipeline, stack decision, milestones, and the live endpoint table.
Anything Engine · SpecAnything Engine
14-class dispatch spec — why it replaces the 6-tool router.
Anything Engine · IterationPlaying With Prompts
Where the prompts live, how to edit, how model selection works, pipeline diagram.
find_investors · Edge CasesEdge Cases
24 documented edge cases across input, filter, synthesis, pipeline, and UI.
Data Plane · SchemaAlloyDB Schema
Robert's parallel schema draft and the 7 open questions for Mark.
Data Plane · ScaNNAlloyDB ScaNN
Hybrid SQL + vector search in one call — ScaNN reference notes.
Memory · ZepZep Memory
One-pager on Zep — why it beat Mem0/Cognee, integration shape, open questions.
Quality · Test HarnessTest Harness
Fixture-driven replay rig scoping with 7 open calls for Mark.
Ops · DeploymentDeployment
What's deployed where.
find_investors · ContextContext Pipeline
File upload → GCS → unstructured.io → pitch profile polling. Table schema, BFF routes, fn 12930 bug fixes, and the FK contract for Mark's pipeline.