Writing · Case study · 2026 · Codebase RAG

RepoDoc:codebase RAG built as infrastructure

Retrieval runs over what each file means, indexing is a durable Postgres lease queue, and every token is metered against a per-project budget.

Live repodoc.parbhat.dev

Source github.com

Scroll

fig. 00 — results

0: dim embedding per file summary
top-5: pgvector retrieval + top-3 memory
0-min: lease per indexing job
0: returned on a project over budget

Section 01

The problem

Most of the work in understanding a codebase isn't reading the file you have open - it's finding the three files you didn't know to open.

Onboarding to an unfamiliar repo means reconstructing a mental model that lives nowhere: which module owns auth, where the rate limit is configured, what actually runs on a cron. Grep finds strings; it doesn't find concepts. Documentation, when it exists, drifts from the code the day after it's written.

"RAG over a codebase" is the obvious answer and the obvious trap. The demo is easy: chunk the files, embed the chunks, retrieve top-k, call an LLM. It falls apart on real repos for reasons that have nothing to do with prompting. Raw code embeds lexically - variable names and syntax - so a query like "how does authentication work" retrieves whatever file happens to share tokens with the question, not the file that implements the concept. And ingesting a whole repository is a systems problem, not a model problem: it has to survive serverless time limits, partial failures, and the fact that an LLM call per file turns "index this repo" into an unbounded bill.

Grep finds strings; it doesn't find concepts.

Section 02

The thesis

RepoDoc takes four opinionated positions, and they're the reason it behaves differently from a generic RAG wrapper.

01Embed what a file means, not what it says
During indexing, each file is summarized by an LLM into a ≤100-word description of its purpose, and that summary is embedded - gemini-embedding-001 at 768 dimensions - not the raw source. Retrieval is then a cosine search over intent against pgvector, top-5. The cost is an extra LLM call per file at index time and a dependency on summary quality; the payoff is that "where are rate limits configured" retrieves the rate limiter even when the query shares no tokens with it.
02The database is the queue
Indexing is modeled as an IndexingJob row in Postgres with a lease, not as a call to SQS or Redis or BullMQ. A worker claims a job with an atomic compare-and-swap, holds a five-minute lease, and releases it on completion or failure. One datastore, transactional with the data it indexes, no extra infrastructure to operate.
03Cost is a runtime constraint, not a dashboard
Every AI request writes a QueryMetrics row (model, tokens, latency, estimated USD, retrieval and memory counts, cold-start and cache flags, success/error). A project can set monthlyCostLimitUsd; when it's exceeded, a query returns 402 and in-flight indexing pauses itself and requeues. Spend is bounded in the hot path, not reconciled after the bill arrives.
04Durable repo memory, separate from retrieval
RepoDoc extracts facts from each Q&A exchange into a RepoMemory store and pulls the top matches back as secondary context on later questions - capturing intent and decisions that live in conversations, not in any file. It's labeled distinctly from code, under one rule: when memory and code conflict, the code wins.

Section 03

Indexing: the database is the queue

Indexing is a job, not a request. Triggers enqueue an IndexingJob; a worker claims it atomically and processes the repo file by file.

FIG. 01

Triggers

Project create

▶

Query vs unindexed

▶

Daily cron0 6 * * *

Claim - exactly once

claimJobconditional updateMany

▶

res.count === 1CAS, no locks

▶

5-min leaseheld while working

Per file

GithubRepoLoaderwalk repo

▶

Summarize≤100 words

▶

Embed768d

▶

StorePostgres + pgvector

FIG. 01 - Triggers: on project create, on a query against an unindexed project, and a daily Vercel cron (0 6 * * *). claimJob is a conditional updateMany that flips queued/stale-processing → processing and trusts the claim only when res.count === 1 - a compare-and-swap on the row, no advisory locks.

Section 04

Query: grounded answer, or a pre-index fallback

Every query passes the same gate. If the project has no embeddings yet, RepoDoc answers from a live GitHub fetch instead of making the user wait for a full index.

FIG. 02

Gate

POST /api/query

▶

auth

▶

rate-limit

▶

ownership

▶

budget

Indexed path

pgvector top-5+ RepoMemory top-3

▶

OpenRoutergemini-2.5-flash

▶

Cited answer+ metrics, cache, memory

Pre-index path - embeddings == 0

Live-fetch ≤22 filesREADMEs, configs

▶

Answer nowpreindex: true

▶

Kick workerindex in background

FIG. 02 - If embeddings == 0, the pre-index path live-fetches up to 22 high-value files (READMEs, configs, entrypoints) straight from the GitHub tree, answers now with preindex: true, and kicks the worker. Otherwise it retrieves pgvector top-5 plus RepoMemory top-3, grounds the answer with cited sources, and writes a QueryMetrics row.

Section 05

Surviving the serverless wall

A large repo can't be indexed in one 60-second invocation, and a worker can die mid-job. Both are handled by the same lease + cursor mechanism.

FIG. 03

Timing out gracefully

WORKER_BUDGET_MStime-box

▶

resumeAfter cursorwrite progress

▶

Requeue + re-kicknext invocation

Recovering a dead worker

Lease expireslockedAt > 5 min

▶

Next worker reclaims@@index(status, lockedAt)

▶

Resume from cursorno re-embedding

FIG. 03 - The worker time-boxes itself (WORKER_BUDGET_MS); when it runs out it writes a resumeAfter cursor, requeues, and re-kicks, so indexing makes forward progress across many short runs. A job stuck in processing with lockedAt older than five minutes is reclaimable - @@index([status, lockedAt]) makes finding it cheap - and resumes from its cursor.

Section 06

Why this is hard

01Claiming a job exactly once under concurrent workers
claimJob is a conditional updateMany - it flips queued/stale-processing → processing and trusts the claim only when res.count === 1. Two workers that wake on the same job can't both win; it's a compare-and-swap on the row, no advisory locks.
02Surviving the serverless 60-second wall
Indexing a large repo can't finish in one invocation. The worker time-boxes itself, and when it runs out it writes a resumeAfter cursor, requeues the job, and re-kicks - so indexing makes forward progress across many short runs instead of dying at the platform timeout.
03Recovering a dead worker without double-processing
A lease is five minutes. A job stuck in processing with lockedAt older than that is reclaimable; @@index([status, lockedAt]) makes finding it cheap. A crashed worker's job is picked up by the next one and resumed from its cursor.
04Being useful before indexing finishes
queryCodebasePreindex fetches up to 22 high-value files (READMEs, configs, entrypoints) straight from the GitHub tree and answers from those, flagging the response preindex: true, while kicking the indexer in the background.
05Bounding spend mid-flight
isProjectOverBudget short-circuits queries to 402, and the indexer checks budget between files - a job that would blow the limit pauses rather than running the meter up.

Section 07

Design decisions & tradeoffs

Embed LLM summaries, not raw code

Why: code embeds lexically; intent is what you query by.
Tradeoff: an LLM call per file at index time, and retrieval is only as good as the summaries.

Postgres as the job queue (lease + CAS)

Why: one transactional datastore, nothing extra to run.
Tradeoff: polling, not push; not built for very high job throughput.

OpenRouter as the single chat gateway

Why: route models per task without SDK churn, one billing surface - gemini-2.5-flash for chat, gemini-2.5-pro for README generation, Claude Haiku for docs.
Tradeoff: an extra network hop and no native multi-provider failover.

Budget enforced in the request path and mid-index

Why: AI cost is unbounded by default; a ceiling has to be live to matter.
Tradeoff: a hard limit can interrupt indexing, which is why jobs are resumable.

Rate limiting and secret encryption fail open

Why: for a single-operator product, a Redis blip or an unset key shouldn't 500 every request.
Tradeoff: a deliberately weaker posture under those failures - documented, not hidden.

Section 08

Failure modes

Worker dies mid-index

Its lease expires after five minutes; the next worker reclaims the job and resumes from the resumeAfter cursor.
No stuck jobs, no re-embedding from scratch.

Serverless invocation times out

The time-box requeues with a cursor before the platform kills the function.
Indexing continues on the next invocation.

Project queried before it's indexed

The pre-index path answers from live GitHub fetches.
The result is marked preindex: true.

Project exceeds its budget

Queries return 402 with a clear message.
Running indexing pauses and requeues instead of overspending.

Redis unavailable / provider error

The rate limiter falls back to a per-instance in-memory window.
During indexing, each summary and embedding is retried twice with backoff; a persistent failure marks the job failed with the error string, surfaced in the UI for retry.

Section 09

Security model

Auth

Clerk middleware guards everything except an explicit public allow-list; each API route re-checks the session and returns 401.
Project access is scoped by owner and deletedAt: null on every query.

Input & SQL

Zod validates request bodies; Prisma parameterizes all queries.
The only raw SQL is the pgvector similarity search and embedding writes - both parameter-bound.

Secrets at rest

Stored GitHub tokens are encrypted with AES-256-GCM in an envelope format, keyed by ENCRYPTION_KEY.
If ENCRYPTION_KEY is unset, the code falls back to storing plaintext so the app keeps working - set the key in every environment to actually get encryption.

Webhooks

The Clerk webhook verifies the svix HMAC signature and rejects on mismatch.
The cron worker route authorizes with a constant-time (timingSafeEqual) shared-secret check.

Billing webhook

The Gumroad billing webhook authenticates with a constant-time shared-secret check and maps a product permalink to a plan.
Auto-downgrades to Starter on refund, chargeback, or cancellation.

Rate limiting

Per-identity fixed-window limiting, preferring the platform-set x-real-ip over the spoofable x-forwarded-for.
Returns 429 with Retry-After; it fails open under store failure.

Section 10

Testing

The suite is small and pointed - it covers the three places where a regression would be silent and expensive - and it runs in CI on every change.

01What's covered
20 Jest unit tests across three files exercise the query gate, the GitHub loader, and the RAG retrieval - Clerk, Prisma, Octokit, and the RAG layer mocked so each unit is tested in isolation.
02Gated in CI
GitHub Actions runs the suite (test:ci) on every push and pull request to main, so a regression in those paths blocks the merge rather than reaching production.
03The deliberate tradeoff
No integration tests against a live database and no end-to-end suite - everything is mocked at the boundaries, kept intentionally lightweight for a single-operator project.

Work with me