Writing · Case study · 2026 · Semantic Email
VectorMail:a semantic Gmail client on one Postgres
Reads, writes, and automation each run on their own guarded rail. Email is a database problem with an AI surface, not the reverse.
fig. 00 — results
- 0
- dim vector on every Email row
- 0 DB
- inbox + embeddings, no vector store
- 0 rails
- read · write · automate, each guarded
- Dry-run
- automation default
Section 01
The problem
Email is the largest structured corpus most people own, and almost none of it is searchable by what it means.
"The invoice Sarah flagged last week" is a trivially human query and a hopeless keyword query - the words invoice and flagged may appear nowhere in the thread. Semantic retrieval is the obvious fix, but it drags two harder problems behind it: where the vectors live, and what happens when AI touches the outbound path instead of just the read path.
Most "AI email" products get both wrong. They bolt a chat box onto a mail client and let the model draft and send with no gate, which turns a hallucination into a delivered message. And they stand up a separate vector database next to the system of record, which means dual writes, re-embedding drift, and tenant scoping enforced in two places instead of one.
VectorMail is built on the opposite bet: keep meaning in the same database as the mail, and give reading, writing, and automating their own rails with their own guarantees - because those three operations have wildly different blast radii.
Email is a database problem with an AI surface, not the reverse.
Section 02
Thesis: one stack, one database
Embeddings live on the Email row as a vector(768) column, and search is a parameterized $queryRaw using pgvector's <=> cosine operator, scoped by accountId in the same WHERE clause that enforces tenancy.
There is no separate vector store, no sync job between the system of record and an index, no second place to get scoping wrong. The usual objection is that a dedicated vector DB scales further. True at billions of vectors - irrelevant at per-account email volume, where the working set is thousands of rows behind an accountId filter.
What you give up in theoretical ANN throughput you get back in having exactly one consistency domain: when a thread is deleted or re-labeled, its vectors are already gone, because they were never anywhere else. When an embedding is missing, the query degrades to text search rather than failing.
Section 03
Architecture
A request flows in one direction. Gmail is reached only through Aurinko, so OAuth, delta sync, send, and labels are one integration surface. The three AI surfaces sit on top of the same tables.
Integration - one surface
Read - resolved once, server-side
Section 04
Two AI surfaces, one inbox
Reading and writing get different rails because a bad read shows you the wrong email and a bad write sends one.
- 01AI Search - the read rail
It classifies intent (search / summarize / select), answers with a deterministic Sources footer (subject, sender, date, snippet) so every claim is traceable, and keeps lightweight session memory so "summarize that one" resolves. It is instructed never to invent amounts or dates and never to offer to send.
- 02Buddy - the write rail, guarded twice
A topic gate runs a regex blacklist on the user's pre-augmentation message - math, code, recipes, trivia, weather all bounce before a token reaches the model. Then, immediately before Account.sendEmail, subject and body pass through containsOutgoingViolation(): word-boundary patterns for slurs, threats, sexual violence, CSAM, drug trade, and fraud.
- 03The model is upstream of the gate
It is not trusted to be the gate.
Section 05
The write rail is guarded twice
Buddy drafts and sends. Because a bad write is a delivered message, the model sits between two deterministic gates it cannot talk its way past.
Buddy - write path
Section 06
Automation: simulate by default
Automation has three modes on the account - manual, assist, auto - and every automated action is an ActionExecution row that is dryRun by default.
Modes
Every action - three things before it's allowed to exist
Section 07
Why this is hard
Sync correctness is the whole game
- The lock (up to a 30-minute wait/TTL, Upstash → ioredis → in-memory) is the easy part.
- Completion was originally inferred from "has a delta token," so an account whose token was set early would stall with months of mail missing.
- Completion is now an explicit inboxBackfilledAt timestamp, set only when the list walk reaches the oldest message; an un-backfilled inbox is not delegated to the background worker.
Idempotent ingestion
- Every email upserts by Email.internetMessageId @unique.
- Provider redeliveries, retried syncs, and overlapping delta windows converge on one row instead of duplicating threads.
Automation that cannot double-fire
- ActionExecution.idempotencyKey @unique plus per-execution Inngest concurrency of 1 means a retry storm produces a uniqueness violation, not three sent emails.
- Permanent failures land in a FailedJob DLQ keyed by (jobType, resourceId).
Search that degrades instead of erroring
- Search falls back to text search on three conditions: a zero query embedding, zero vector hits, or any pgvector error.
- Search returning weaker results beats search throwing.
Bounded AI spend
- Per-user limiters (60 searches/min, 100 AI calls/min) cap rate.
- AI_DAILY_CAP_TOKENS caps daily cost by summing input+output tokens from the AiUsage table before each call.
Gmail mangles plaintext
- Buddy drafts are plain text; sent naively, Gmail collapses them into one unbroken block.
- Before send, the body is converted to inline-styled HTML - \n\n to <p>, numbered lines to <ol>, **bold** to <strong>.
Section 08
Design decisions & tradeoffs
pgvector over a dedicated vector DB
- Why: one consistency domain; tenancy is a WHERE clause.
- Tradeoff: not billion-vector ANN - acceptable at per-account scale.
Sync on user action, no background polling
- Why: stays inside Aurinko/Gmail rate limits, no idle load.
- Tradeoff: not live; relies on first-sync plus manual sync.
AI is optional
- Why: the default deploy runs with zero AI keys - connect, sync, list, send all work.
- Tradeoff: without keys, search degrades to text and compose/summaries are off.
In-memory lock fallback
- Why: a single instance runs with no Redis dependency.
- Tradeoff: multi-instance coordination requires Redis.
Dry-run automation by default
- Why: safety over autonomy.
- Tradeoff: real autosend needs explicit opt-in and a running worker.
Section 09
Failure modes
Redis down or unreachable
- Lock acquisition retries for up to 30 minutes, then throws.
- Once Redis is selected there is no silent downgrade to the in-memory lock - failing loud beats running two syncs.
Inngest down or unconfigured
- Sync and inbox reads are unaffected.
- Scheduled sends and embedding/summary jobs queue until it returns.
LLM provider down
- Compose and summaries surface errors.
- Search falls back to text so the read path keeps working.
Aurinko rate limit / auth error
- The account's needsReconnection flag is set.
- The UI prompts a reconnect.
Cron not firing
- Due ScheduledSend rows stay pending until the next successful tick.
- No send is lost, just delayed.
Embedding job fails repeatedly
- 5 retries with exponential backoff, then a FailedJob DLQ entry keyed by (jobType, resourceId).
- The email stays text-searchable until re-enqueued.
Section 10
Security model
Auth & scoping
- Clerk; protectedProcedure on tRPC; middleware guards /mail and /buddy.
- Every mail query passes through authoriseAccountAccess(accountId, userId) - no cross-tenant reads.
Outbound safety
- Buddy topic gate on the pre-augmentation message.
- containsOutgoingViolation() before send.
Cost / abuse
- Per-user rate limits.
- AI_DAILY_CAP_TOKENS.
Headers
- X-Frame-Options DENY, X-Content-Type-Options nosniff, Referrer-Policy, Permissions-Policy, HSTS.
- A CSP with frame-ancestors none.
Inputs & SQL
- Zod on every tRPC input.
- No raw SQL except the parameterized pgvector similarity query.
Tokens at rest
- Optional AES-256-GCM envelope encryption of stored account tokens.
- Enabled when TOKEN_ENCRYPTION_KEY is set.
Section 11
Demo mode as an architectural commitment
The demo isn't a fixture set behind a feature flag.
The tRPC context rewrites ctx.auth.userId to DEMO_USER_ID, and every account-scoped procedure resolves demo data through the same isDemoCall check - no forked controllers, no parallel demo router.
A demo that runs through the production resolver exercises the production auth and account-scoping path; if scoping were broken, the demo would leak too.
Section 12
Testing
Testing concentrates on the rails where a mistake is expensive - the read/write/automate guards and the semantic layer - at the unit level, with the landing flow covered in a real browser.
- 01Unit and eval coverage
156 Jest tests cover the utilities, the automation-hardening guards, and an eval suite for intent detection, with the database and Clerk mocked so the logic is tested in isolation.
- 02End-to-end on the landing flow
8 Playwright tests drive the landing experience in a real browser.
- 03The deliberate tradeoff
No integration tests against a live Postgres - the suite mocks at the database layer - and no CI workflow runs the tests automatically yet.