Available for select roles--:--:--/Remote - US·EU

Parbhat Kapila - AI Systems Engineer

Buildingproduction AIsystems.

Ship to production, then keep it running.

10K+: Emails indexed / VectorMail
12-Stage: AI video pipeline / Cutline
CRM-gated: Data ingestion / Sentinel

Let's build something

View résumé

Parbhat Kapila - AI Systems Engineer — Parbhat Kapila/ '26

Selected Work

Production AI systems, shipped & operated.

Measurable outcomes, live in production - built, deployed, and maintained independently.

Revenue Intelligence · Permission Layer

Sentinel

A revenue-intelligence backend whose first question is what it's even allowed to read. Email, calendar, and chat are ingested only when a participant matches a contact synced from your CRM - read-only, fail-closed, and PII-blind, with encrypted per-integration tokens.

Source Read the deep-dive Visit

Before

Every revenue tool runs on a company's most sensitive data - email, calendar, chat. Ingest everything and it's a surveillance liability; gate it by hand and coverage rots within a month.

After

Binds ingestion to a boundary the business already maintains: the CRM contact book. A message is stored only if a participant matches a synced contact - everything else is dropped, fail-closed, without logging an address. Across Gmail, Calendar, and Slack, with read-only HubSpot and Salesforce sync.

CRM: Permission layer
Read-only: Never writes back
Fail-closed: On any error
Encrypted: Tokens at rest

Next.js 16 · TypeScript · PostgreSQL · Prisma · Clerk · Upstash Redis · OpenRouter · Sentry

AI Video Pipeline · 12-Stage Director

CUTLINE

One sentence in, one finished MP4 out - directed by a 12-stage pipeline, not a template engine. It infers audience, goal, tone, and arc; plans 8-12 shots; sources or generates imagery; composes the render. No templates, no creative knobs.

Source Read the deep-dive Visit

Before

"AI video" tools automate the cut, not the call. Same pan, same stock photo, same captions. A prompt box bolted to a template - generic in, generic out.

After

A director layer makes the editorial calls before a frame renders. One sentence becomes audience, tone, and a shot-by-shot arc. Twelve deterministic stages - bisect a bad render, swap a provider, cap cost, per stage. Not a template engine. A system that directs.

12 Stages: Pure-function pipeline
Zero: Creative knobs
3 Modes: Talking character
4-Tier: Image fallback

Next.js 16 · TypeScript · Remotion · BullMQ · Redis · Better Auth · Neon Postgres · Stripe · Google VEO · HeyGen · ElevenLabs

Codebase RAG · Indexing Infrastructure

RepoDoc

Ask an unfamiliar repo questions and get grounded, cited answers. Retrieval runs over what each file means - LLM summaries embedded, not raw code - indexing is a durable Postgres lease queue that survives serverless limits, and every token is metered against a per-project budget.

Source Read the deep-dive Visit

Before

Grep finds strings, not concepts. Naive 'RAG over code' embeds syntax, not meaning - and runs up an unbounded indexing bill.

After

Embeds what each file means, not what it says. The database is the queue - exactly-once, leased, resumable across timeouts. Budgets cap spend mid-index. Infrastructure, not a RAG wrapper.

Exactly-once: Lease + CAS jobs
Self-resuming: Survives timeouts
Cost-capped: Per-project budget
Intent-based: Meaning, not syntax

Next.js 16 · TypeScript · PostgreSQL · pgvector · Prisma · OpenRouter · Gemini · Clerk · Upstash Redis · Zod

Semantic Search · Email

VectorMail

Search your whole inbox by meaning - describe a thread and it surfaces, even with none of the original words. Connects Gmail through Aurinko, syncs every thread, and runs semantic search on pgvector, with inbox and embeddings in one database - no separate vector store. Replies compose with full thread context.

Source Read the deep-dive Visit

Before

Every inbox still searches by keyword. You remember the gist of a thread, not the exact words - so you scroll forever, or never find it.

After

Semantic search across 10k+ threads on pgvector - ask for 'the pricing thread,' get it instantly. Inbox and embeddings in one Postgres, no separate vector store. AI drafts replies with full thread context.

Sub-second: 10k+ emails
One DB: inbox + vectors
By meaning: not keywords
AI compose: thread context

Next.js 15 · TypeScript · tRPC · Prisma · PostgreSQL · pgvector · Clerk · Aurinko · OpenRouter · Gemini

View all projects on GitHub

Stack

The tools, in production.

Core depth: production RAG and vector search at scale. The rest is full-stack because shipping AI means owning the whole pipeline, not just the model.

Frontend (Product UI): TypeScriptReactNext.js (App Router)Tailwind CSSRemotion
Backend & APIs: Node.jsPythonFastAPItRPCZodWebSockets
AI Systems (Production): OpenAIClaudeGeminiOpenRouterRAG pipelinespgvector
Data & Infrastructure: PostgreSQLPrismaRedisBullMQObject Storage (S3)
Observability & Ops: SentryOpenTelemetryClerkBetter AuthStripe
Cloud & Deployment: AWSDockerVercelCI/CD (GitHub Actions)
Architecture & Practices: Distributed systemsEvent-driven designMulti-tenant SaaSCost optimization

Open sourcegithub.com/parbhatkapila4

About

I build AI systems and run them - the whole stack, from schema to monitoring. Four years of it, in production. The ones here are live, and I maintain them.

Most of my work is RAG and LLM infrastructure - retrieval, vector storage, and model routing - tuned for low latency (sub-200ms) and 94%+ accuracy on real workloads.

Looking for a full-time, early-stage role where I own real systems and ship without hand-holding.

System Architecture

Multi-tenant SaaS with per-tenant data isolation, auto-scaling infra, and deployments tuned for cost. Cut infrastructure spend ~95% through architecture changes.

AI/ML Production

RAG over 10,000+ documents at 94%+ retrieval accuracy, pgvector queries under 200ms. Models picked per task and routed through OpenRouter - GPT-4, Claude, or Gemini - with fallback when a provider degrades or prices spike.

Performance & Optimization

Brought per-document processing from $5.00 to $0.05 with hash-based chunk reuse. Semantic search holds under 200ms at load. 99.9% uptime.

Ownership

I own the whole thing: decisions, features, deploys, monitoring, and the stuff that breaks after launch. TypeScript, Next.js, Python, PostgreSQL, Redis, AWS, and Vercel.

How I work

Async-first - decisions documented, PRs that explain the why, not just the what.
Full ownership - I design, deploy, monitor, and fix; root cause over workarounds.
Ship continuously without breaking what's already live.
Reply within 24 hours to anything real.

Experience

Building & operating, end to end.

May 2022 - Present

Independent / Building for early-stage startups · Remote

Founder & AI Systems Engineer

Full ownership of system design, feature delivery, reliability, and iteration - everything here built, deployed, and maintained by me.

Owned backend services, data stores, AI pipelines, and deployment infrastructure, including authentication, payments, and third-party integrations. Debugged production incidents, performance bottlenecks, and scaling limits while shipping continuously without breaking live systems.

Focus

RAG & vector searchProduction retrieval on Postgres / pgvector - chunking strategies, persistent embeddings, and context-grounded synthesis to keep answers accurate.
Cost-efficient AICut model and processing spend 50-80% with hash-based chunk reuse, embedding caches, and multi-provider routing - quality held constant.
Real-time & reliableIdempotent webhooks, queue-backed jobs, retries with backoff, and graceful degradation when upstream APIs fail.
Observability & opsHealth checks, structured logging, Sentry + OpenTelemetry tracing, and self-healing recovery - debugged to root cause, not symptoms.

Next.js · TypeScript · Python · PostgreSQL · Redis · OpenAI · pgvector · Docker · AWS

Contact

Let's work
together.

Open to full-time remote roles at early-stage, AI-first startups. Hand me the problem and the constraints - I'll take it from design through deploy and on-call. Fully remote, flexible across US/EU hours.

ScheduleBook a 30-min call

Emailparbhat@parbhat.work

LinkedInin/parbhat-kapila

Twitter@Parbhat03

Book a call

Buildingproduction AIsystems.