Writing · Case study · 2026 · Codebase RAG

RepoDoc:codebase RAG built as infrastructure

Retrieval runs over what each file means, indexing is a durable Postgres lease queue, and every token is metered against a per-project budget.

fig. 00 results

0
dim embedding per file summary
top-5
pgvector retrieval + top-3 memory
0-min
lease per indexing job
0
returned on a project over budget

Section 01

The problem

Most of the work in understanding a codebase isn't reading the file you have open - it's finding the three files you didn't know to open.

Onboarding to an unfamiliar repo means reconstructing a mental model that lives nowhere: which module owns auth, where the rate limit is configured, what actually runs on a cron. Grep finds strings; it doesn't find concepts. Documentation, when it exists, drifts from the code the day after it's written.

"RAG over a codebase" is the obvious answer and the obvious trap. The demo is easy: chunk the files, embed the chunks, retrieve top-k, call an LLM. It falls apart on real repos for reasons that have nothing to do with prompting. Raw code embeds lexically - variable names and syntax - so a query like "how does authentication work" retrieves whatever file happens to share tokens with the question, not the file that implements the concept. And ingesting a whole repository is a systems problem, not a model problem: it has to survive serverless time limits, partial failures, and the fact that an LLM call per file turns "index this repo" into an unbounded bill.

Grep finds strings; it doesn't find concepts.

Section 02

The thesis

RepoDoc takes four opinionated positions, and they're the reason it behaves differently from a generic RAG wrapper.

  • 01Embed what a file means, not what it says

    During indexing, each file is summarized by an LLM into a ≤100-word description of its purpose, and that summary is embedded - gemini-embedding-001 at 768 dimensions - not the raw source. Retrieval is then a cosine search over intent against pgvector, top-5. The cost is an extra LLM call per file at index time and a dependency on summary quality; the payoff is that "where are rate limits configured" retrieves the rate limiter even when the query shares no tokens with it.

  • 02The database is the queue

    Indexing is modeled as an IndexingJob row in Postgres with a lease, not as a call to SQS or Redis or BullMQ. A worker claims a job with an atomic compare-and-swap, holds a five-minute lease, and releases it on completion or failure. One datastore, transactional with the data it indexes, no extra infrastructure to operate.

  • 03Cost is a runtime constraint, not a dashboard

    Every AI request writes a QueryMetrics row (model, tokens, latency, estimated USD, retrieval and memory counts, cold-start and cache flags, success/error). A project can set monthlyCostLimitUsd; when it's exceeded, a query returns 402 and in-flight indexing pauses itself and requeues. Spend is bounded in the hot path, not reconciled after the bill arrives.

  • 04Durable repo memory, separate from retrieval

    RepoDoc extracts facts from each Q&A exchange into a RepoMemory store and pulls the top matches back as secondary context on later questions - capturing intent and decisions that live in conversations, not in any file. It's labeled distinctly from code, under one rule: when memory and code conflict, the code wins.

Section 03

Indexing: the database is the queue

Indexing is a job, not a request. Triggers enqueue an IndexingJob; a worker claims it atomically and processes the repo file by file.

FIG. 01

Triggers

Project create
Query vs unindexed
Daily cron0 6 * * *

Claim - exactly once

claimJobconditional updateMany
res.count === 1CAS, no locks
5-min leaseheld while working

Per file

GithubRepoLoaderwalk repo
Summarize≤100 words
Embed768d
StorePostgres + pgvector
FIG. 01 - Triggers: on project create, on a query against an unindexed project, and a daily Vercel cron (0 6 * * *). claimJob is a conditional updateMany that flips queued/stale-processing → processing and trusts the claim only when res.count === 1 - a compare-and-swap on the row, no advisory locks.

Section 04

Query: grounded answer, or a pre-index fallback

Every query passes the same gate. If the project has no embeddings yet, RepoDoc answers from a live GitHub fetch instead of making the user wait for a full index.

FIG. 02

Gate

POST /api/query
auth
rate-limit
ownership
budget

Indexed path

pgvector top-5+ RepoMemory top-3
OpenRoutergemini-2.5-flash
Cited answer+ metrics, cache, memory

Pre-index path - embeddings == 0

Live-fetch ≤22 filesREADMEs, configs
Answer nowpreindex: true
Kick workerindex in background
FIG. 02 - If embeddings == 0, the pre-index path live-fetches up to 22 high-value files (READMEs, configs, entrypoints) straight from the GitHub tree, answers now with preindex: true, and kicks the worker. Otherwise it retrieves pgvector top-5 plus RepoMemory top-3, grounds the answer with cited sources, and writes a QueryMetrics row.

Section 05

Surviving the serverless wall

A large repo can't be indexed in one 60-second invocation, and a worker can die mid-job. Both are handled by the same lease + cursor mechanism.

FIG. 03

Timing out gracefully

WORKER_BUDGET_MStime-box
resumeAfter cursorwrite progress
Requeue + re-kicknext invocation

Recovering a dead worker

Lease expireslockedAt > 5 min
Next worker reclaims@@index(status, lockedAt)
Resume from cursorno re-embedding
FIG. 03 - The worker time-boxes itself (WORKER_BUDGET_MS); when it runs out it writes a resumeAfter cursor, requeues, and re-kicks, so indexing makes forward progress across many short runs. A job stuck in processing with lockedAt older than five minutes is reclaimable - @@index([status, lockedAt]) makes finding it cheap - and resumes from its cursor.

Section 06

Why this is hard

  • 01Claiming a job exactly once under concurrent workers

    claimJob is a conditional updateMany - it flips queued/stale-processing → processing and trusts the claim only when res.count === 1. Two workers that wake on the same job can't both win; it's a compare-and-swap on the row, no advisory locks.

  • 02Surviving the serverless 60-second wall

    Indexing a large repo can't finish in one invocation. The worker time-boxes itself, and when it runs out it writes a resumeAfter cursor, requeues the job, and re-kicks - so indexing makes forward progress across many short runs instead of dying at the platform timeout.

  • 03Recovering a dead worker without double-processing

    A lease is five minutes. A job stuck in processing with lockedAt older than that is reclaimable; @@index([status, lockedAt]) makes finding it cheap. A crashed worker's job is picked up by the next one and resumed from its cursor.

  • 04Being useful before indexing finishes

    queryCodebasePreindex fetches up to 22 high-value files (READMEs, configs, entrypoints) straight from the GitHub tree and answers from those, flagging the response preindex: true, while kicking the indexer in the background.

  • 05Bounding spend mid-flight

    isProjectOverBudget short-circuits queries to 402, and the indexer checks budget between files - a job that would blow the limit pauses rather than running the meter up.

Section 07

Design decisions & tradeoffs

01

Embed LLM summaries, not raw code

  • Why: code embeds lexically; intent is what you query by.
  • Tradeoff: an LLM call per file at index time, and retrieval is only as good as the summaries.
02

Postgres as the job queue (lease + CAS)

  • Why: one transactional datastore, nothing extra to run.
  • Tradeoff: polling, not push; not built for very high job throughput.
03

OpenRouter as the single chat gateway

  • Why: route models per task without SDK churn, one billing surface - gemini-2.5-flash for chat, gemini-2.5-pro for README generation, Claude Haiku for docs.
  • Tradeoff: an extra network hop and no native multi-provider failover.
04

Budget enforced in the request path and mid-index

  • Why: AI cost is unbounded by default; a ceiling has to be live to matter.
  • Tradeoff: a hard limit can interrupt indexing, which is why jobs are resumable.
05

Rate limiting and secret encryption fail open

  • Why: for a single-operator product, a Redis blip or an unset key shouldn't 500 every request.
  • Tradeoff: a deliberately weaker posture under those failures - documented, not hidden.

Section 08

Failure modes

01

Worker dies mid-index

  • Its lease expires after five minutes; the next worker reclaims the job and resumes from the resumeAfter cursor.
  • No stuck jobs, no re-embedding from scratch.
02

Serverless invocation times out

  • The time-box requeues with a cursor before the platform kills the function.
  • Indexing continues on the next invocation.
03

Project queried before it's indexed

  • The pre-index path answers from live GitHub fetches.
  • The result is marked preindex: true.
04

Project exceeds its budget

  • Queries return 402 with a clear message.
  • Running indexing pauses and requeues instead of overspending.
05

Redis unavailable / provider error

  • The rate limiter falls back to a per-instance in-memory window.
  • During indexing, each summary and embedding is retried twice with backoff; a persistent failure marks the job failed with the error string, surfaced in the UI for retry.

Section 09

Security model

01

Auth

  • Clerk middleware guards everything except an explicit public allow-list; each API route re-checks the session and returns 401.
  • Project access is scoped by owner and deletedAt: null on every query.
02

Input & SQL

  • Zod validates request bodies; Prisma parameterizes all queries.
  • The only raw SQL is the pgvector similarity search and embedding writes - both parameter-bound.
03

Secrets at rest

  • Stored GitHub tokens are encrypted with AES-256-GCM in an envelope format, keyed by ENCRYPTION_KEY.
  • If ENCRYPTION_KEY is unset, the code falls back to storing plaintext so the app keeps working - set the key in every environment to actually get encryption.
04

Webhooks

  • The Clerk webhook verifies the svix HMAC signature and rejects on mismatch.
  • The cron worker route authorizes with a constant-time (timingSafeEqual) shared-secret check.
05

Billing webhook

  • The Gumroad billing webhook authenticates with a constant-time shared-secret check and maps a product permalink to a plan.
  • Auto-downgrades to Starter on refund, chargeback, or cancellation.
06

Rate limiting

  • Per-identity fixed-window limiting, preferring the platform-set x-real-ip over the spoofable x-forwarded-for.
  • Returns 429 with Retry-After; it fails open under store failure.

Section 10

Testing

The suite is small and pointed - it covers the three places where a regression would be silent and expensive - and it runs in CI on every change.

  • 01What's covered

    20 Jest unit tests across three files exercise the query gate, the GitHub loader, and the RAG retrieval - Clerk, Prisma, Octokit, and the RAG layer mocked so each unit is tested in isolation.

  • 02Gated in CI

    GitHub Actions runs the suite (test:ci) on every push and pull request to main, so a regression in those paths blocks the merge rather than reaching production.

  • 03The deliberate tradeoff

    No integration tests against a live database and no end-to-end suite - everything is mocked at the boundaries, kept intentionally lightweight for a single-operator project.