Writing · Case study · 2026 · Semantic Email

VectorMail:a semantic Gmail client on one Postgres

Reads, writes, and automation each run on their own guarded rail. Email is a database problem with an AI surface, not the reverse.

Live vectormail.space

Source github.com

Scroll

fig. 00 — results

0: dim vector on every Email row
0 DB: inbox + embeddings, no vector store
0 rails: read · write · automate, each guarded
Dry-run: automation default

Section 01

The problem

Email is the largest structured corpus most people own, and almost none of it is searchable by what it means.

"The invoice Sarah flagged last week" is a trivially human query and a hopeless keyword query - the words invoice and flagged may appear nowhere in the thread. Semantic retrieval is the obvious fix, but it drags two harder problems behind it: where the vectors live, and what happens when AI touches the outbound path instead of just the read path.

Most "AI email" products get both wrong. They bolt a chat box onto a mail client and let the model draft and send with no gate, which turns a hallucination into a delivered message. And they stand up a separate vector database next to the system of record, which means dual writes, re-embedding drift, and tenant scoping enforced in two places instead of one.

VectorMail is built on the opposite bet: keep meaning in the same database as the mail, and give reading, writing, and automating their own rails with their own guarantees - because those three operations have wildly different blast radii.

Email is a database problem with an AI surface, not the reverse.

Section 02

Thesis: one stack, one database

Embeddings live on the Email row as a vector(768) column, and search is a parameterized $queryRaw using pgvector's <=> cosine operator, scoped by accountId in the same WHERE clause that enforces tenancy.

There is no separate vector store, no sync job between the system of record and an index, no second place to get scoping wrong. The usual objection is that a dedicated vector DB scales further. True at billions of vectors - irrelevant at per-account email volume, where the working set is thousands of rows behind an accountId filter.

What you give up in theoretical ANN throughput you get back in having exactly one consistency domain: when a thread is deleted or re-labeled, its vectors are already gone, because they were never anywhere else. When an embedding is missing, the query degrades to text search rather than failing.

Section 03

Architecture

A request flows in one direction. Gmail is reached only through Aurinko, so OAuth, delta sync, send, and labels are one integration surface. The three AI surfaces sit on top of the same tables.

FIG. 01

Integration - one surface

Gmailvia Aurinko only

▶

Per-account lockone sync at a time

▶

Postgresthreads + emails

▶

Inngestsummary + vector(768)

Read - resolved once, server-side

authoriseAccountAccesstenancy gate

▶

AI Searchqueries embeddings

▶

Buddydrafts + moderation

▶

AutomationActionExecution rows

FIG. 01 - Sync acquires the per-account lock, then writes threads and emails into Postgres; analysis jobs attach summaries and the vector(768) embedding out of band via Inngest. Every read past this point goes through authoriseAccountAccess, so tenancy is resolved once, server-side.

Section 04

Two AI surfaces, one inbox

Reading and writing get different rails because a bad read shows you the wrong email and a bad write sends one.

01AI Search - the read rail
It classifies intent (search / summarize / select), answers with a deterministic Sources footer (subject, sender, date, snippet) so every claim is traceable, and keeps lightweight session memory so "summarize that one" resolves. It is instructed never to invent amounts or dates and never to offer to send.
02Buddy - the write rail, guarded twice
A topic gate runs a regex blacklist on the user's pre-augmentation message - math, code, recipes, trivia, weather all bounce before a token reaches the model. Then, immediately before Account.sendEmail, subject and body pass through containsOutgoingViolation(): word-boundary patterns for slurs, threats, sexual violence, CSAM, drug trade, and fraud.
03The model is upstream of the gate
It is not trusted to be the gate.

Section 05

The write rail is guarded twice

Buddy drafts and sends. Because a bad write is a delivered message, the model sits between two deterministic gates it cannot talk its way past.

FIG. 02

Buddy - write path

User messagepre-augmentation

▶

Topic gateregex blacklist

▶

Model draftssubject + body

▶

containsOutgoingViolation()word-boundary patterns

▶

Account.sendEmailonly if clean

FIG. 02 - The topic gate runs a regex blacklist on the user's pre-augmentation message; containsOutgoingViolation() checks the final subject and body with word-boundary patterns immediately before send. The model is upstream of the gate, not trusted to be the gate.

Section 06

Automation: simulate by default

Automation has three modes on the account - manual, assist, auto - and every automated action is an ActionExecution row that is dryRun by default.

FIG. 03

Modes

manualno automation

▶

assistawaiting_approval

▶

autoopt-in + worker

Every action - three things before it's allowed to exist

ActionExecutiondryRun by default

▶

idempotencyKey @uniquedouble-fire = DB error

▶

Inngest concurrency 1per execution

▶

Executed once

FIG. 03 - assist parks the action in awaiting_approval until a human confirms; auto requires explicit opt-in plus a running worker. An idempotencyKey @unique makes double-execution a database error rather than a duplicate send, and the Inngest handler runs at concurrency 1 per execution.

Section 07

Why this is hard

Sync correctness is the whole game

The lock (up to a 30-minute wait/TTL, Upstash → ioredis → in-memory) is the easy part.
Completion was originally inferred from "has a delta token," so an account whose token was set early would stall with months of mail missing.
Completion is now an explicit inboxBackfilledAt timestamp, set only when the list walk reaches the oldest message; an un-backfilled inbox is not delegated to the background worker.

Idempotent ingestion

Every email upserts by Email.internetMessageId @unique.
Provider redeliveries, retried syncs, and overlapping delta windows converge on one row instead of duplicating threads.

Automation that cannot double-fire

ActionExecution.idempotencyKey @unique plus per-execution Inngest concurrency of 1 means a retry storm produces a uniqueness violation, not three sent emails.
Permanent failures land in a FailedJob DLQ keyed by (jobType, resourceId).

Search that degrades instead of erroring

Search falls back to text search on three conditions: a zero query embedding, zero vector hits, or any pgvector error.
Search returning weaker results beats search throwing.

Bounded AI spend

Per-user limiters (60 searches/min, 100 AI calls/min) cap rate.
AI_DAILY_CAP_TOKENS caps daily cost by summing input+output tokens from the AiUsage table before each call.

Gmail mangles plaintext

Buddy drafts are plain text; sent naively, Gmail collapses them into one unbroken block.
Before send, the body is converted to inline-styled HTML - \n\n to <p>, numbered lines to <ol>, **bold** to <strong>.

Section 08

Design decisions & tradeoffs

pgvector over a dedicated vector DB

Why: one consistency domain; tenancy is a WHERE clause.
Tradeoff: not billion-vector ANN - acceptable at per-account scale.

Sync on user action, no background polling

Why: stays inside Aurinko/Gmail rate limits, no idle load.
Tradeoff: not live; relies on first-sync plus manual sync.

AI is optional

Why: the default deploy runs with zero AI keys - connect, sync, list, send all work.
Tradeoff: without keys, search degrades to text and compose/summaries are off.

In-memory lock fallback

Why: a single instance runs with no Redis dependency.
Tradeoff: multi-instance coordination requires Redis.

Dry-run automation by default

Why: safety over autonomy.
Tradeoff: real autosend needs explicit opt-in and a running worker.

Section 09

Failure modes

Redis down or unreachable

Lock acquisition retries for up to 30 minutes, then throws.
Once Redis is selected there is no silent downgrade to the in-memory lock - failing loud beats running two syncs.

Inngest down or unconfigured

Sync and inbox reads are unaffected.
Scheduled sends and embedding/summary jobs queue until it returns.

LLM provider down

Compose and summaries surface errors.
Search falls back to text so the read path keeps working.

Aurinko rate limit / auth error

The account's needsReconnection flag is set.
The UI prompts a reconnect.

Cron not firing

Due ScheduledSend rows stay pending until the next successful tick.
No send is lost, just delayed.

Embedding job fails repeatedly

5 retries with exponential backoff, then a FailedJob DLQ entry keyed by (jobType, resourceId).
The email stays text-searchable until re-enqueued.

Section 10

Security model

Auth & scoping

Clerk; protectedProcedure on tRPC; middleware guards /mail and /buddy.
Every mail query passes through authoriseAccountAccess(accountId, userId) - no cross-tenant reads.

Outbound safety

Buddy topic gate on the pre-augmentation message.
containsOutgoingViolation() before send.

Cost / abuse

Per-user rate limits.
AI_DAILY_CAP_TOKENS.

Headers

X-Frame-Options DENY, X-Content-Type-Options nosniff, Referrer-Policy, Permissions-Policy, HSTS.
A CSP with frame-ancestors none.

Inputs & SQL

Zod on every tRPC input.
No raw SQL except the parameterized pgvector similarity query.

Tokens at rest

Optional AES-256-GCM envelope encryption of stored account tokens.
Enabled when TOKEN_ENCRYPTION_KEY is set.

Section 11

Demo mode as an architectural commitment

The demo isn't a fixture set behind a feature flag.

The tRPC context rewrites ctx.auth.userId to DEMO_USER_ID, and every account-scoped procedure resolves demo data through the same isDemoCall check - no forked controllers, no parallel demo router.

A demo that runs through the production resolver exercises the production auth and account-scoping path; if scoping were broken, the demo would leak too.

Section 12

Testing

Testing concentrates on the rails where a mistake is expensive - the read/write/automate guards and the semantic layer - at the unit level, with the landing flow covered in a real browser.

01Unit and eval coverage
156 Jest tests cover the utilities, the automation-hardening guards, and an eval suite for intent detection, with the database and Clerk mocked so the logic is tested in isolation.
02End-to-end on the landing flow
8 Playwright tests drive the landing experience in a real browser.
03The deliberate tradeoff
No integration tests against a live Postgres - the suite mocks at the database layer - and no CI workflow runs the tests automatically yet.

Work with me