Available for select roles

Parbhat Kapila - AI Systems Engineer

Buildingproduction AIsystems.

Ship to production, then keep it running.

10K+
Emails indexed / VectorMail
12-Stage
AI video pipeline / Cutline
CRM-gated
Data ingestion / Sentinel
Parbhat Kapila - AI Systems Engineer
Parbhat Kapila/ '26

Live products you can use today - all built and run by me, no team behind them. Everything's public and verifiable.

Selected Work

Production AI systems, shipped & operated.

Measurable outcomes, live in production - built, deployed, and maintained independently.

Revenue Intelligence · Permission Layer

Sentinel

A revenue-intelligence backend whose first question is what it's even allowed to read. Email, calendar, and chat are ingested only when a participant matches a contact synced from your CRM - read-only, fail-closed, and PII-blind, with encrypted per-integration tokens.

Before

Every revenue tool runs on a company's most sensitive data - email, calendar, chat. Ingest everything and it's a surveillance liability; gate it by hand and coverage rots within a month.

After

Binds ingestion to a boundary the business already maintains: the CRM contact book. A message is stored only if a participant matches a synced contact - everything else is dropped, fail-closed, without logging an address. Across Gmail, Calendar, and Slack, with read-only HubSpot and Salesforce sync.

CRM
Permission layer
Read-only
Never writes back
Fail-closed
On any error
Encrypted
Tokens at rest

Next.js 16 · TypeScript · PostgreSQL · Prisma · Clerk · Upstash Redis · OpenRouter · Sentry

AI Video Pipeline · 12-Stage Director

CUTLINE

One sentence in, one finished MP4 out - directed by a 12-stage pipeline, not a template engine. It infers audience, goal, tone, and arc; plans 8-12 shots; sources or generates imagery; composes the render. No templates, no creative knobs.

Before

"AI video" tools automate the cut, not the call. Same pan, same stock photo, same captions. A prompt box bolted to a template - generic in, generic out.

After

A director layer makes the editorial calls before a frame renders. One sentence becomes audience, tone, and a shot-by-shot arc. Twelve deterministic stages - bisect a bad render, swap a provider, cap cost, per stage. Not a template engine. A system that directs.

12 Stages
Pure-function pipeline
Zero
Creative knobs
3 Modes
Talking character
4-Tier
Image fallback

Next.js 16 · TypeScript · Remotion · BullMQ · Redis · Better Auth · Neon Postgres · Stripe · Google VEO · HeyGen · ElevenLabs

Codebase RAG · Indexing Infrastructure

RepoDoc

Ask an unfamiliar repo questions and get grounded, cited answers. Retrieval runs over what each file means - LLM summaries embedded, not raw code - indexing is a durable Postgres lease queue that survives serverless limits, and every token is metered against a per-project budget.

Before

Grep finds strings, not concepts. Naive 'RAG over code' embeds syntax, not meaning - and runs up an unbounded indexing bill.

After

Embeds what each file means, not what it says. The database is the queue - exactly-once, leased, resumable across timeouts. Budgets cap spend mid-index. Infrastructure, not a RAG wrapper.

Exactly-once
Lease + CAS jobs
Self-resuming
Survives timeouts
Cost-capped
Per-project budget
Intent-based
Meaning, not syntax

Next.js 16 · TypeScript · PostgreSQL · pgvector · Prisma · OpenRouter · Gemini · Clerk · Upstash Redis · Zod

Semantic Search · Email

VectorMail

Search your whole inbox by meaning - describe a thread and it surfaces, even with none of the original words. Connects Gmail through Aurinko, syncs every thread, and runs semantic search on pgvector, with inbox and embeddings in one database - no separate vector store. Replies compose with full thread context.

Before

Every inbox still searches by keyword. You remember the gist of a thread, not the exact words - so you scroll forever, or never find it.

After

Semantic search across 10k+ threads on pgvector - ask for 'the pricing thread,' get it instantly. Inbox and embeddings in one Postgres, no separate vector store. AI drafts replies with full thread context.

Sub-second
10k+ emails
One DB
inbox + vectors
By meaning
not keywords
AI compose
thread context

Next.js 15 · TypeScript · tRPC · Prisma · PostgreSQL · pgvector · Clerk · Aurinko · OpenRouter · Gemini

Stack

The tools, in production.

Core depth: production RAG and vector search at scale. The rest is full-stack because shipping AI means owning the whole pipeline, not just the model.

Frontend (Product UI)
TypeScriptReactNext.js (App Router)Tailwind CSSRemotion
Backend & APIs
Node.jsPythonFastAPItRPCZodWebSockets
AI Systems (Production)
OpenAIClaudeGeminiOpenRouterRAG pipelinespgvector
Data & Infrastructure
PostgreSQLPrismaRedisBullMQObject Storage (S3)
Observability & Ops
SentryOpenTelemetryClerkBetter AuthStripe
Cloud & Deployment
AWSDockerVercelCI/CD (GitHub Actions)
Architecture & Practices
Distributed systemsEvent-driven designMulti-tenant SaaSCost optimization
Open sourcegithub.com/parbhatkapila4
About

I build AI systems and run them - the whole stack, from schema to monitoring. Four years of it, in production. The ones here are live, and I maintain them.

Most of my work is RAG and LLM infrastructure - retrieval, vector storage, and model routing - tuned for low latency (sub-200ms) and 94%+ accuracy on real workloads.

Looking for a full-time, early-stage role where I own real systems and ship without hand-holding.

01

System Architecture

Multi-tenant SaaS with per-tenant data isolation, auto-scaling infra, and deployments tuned for cost. Cut infrastructure spend ~95% through architecture changes.

02

AI/ML Production

RAG over 10,000+ documents at 94%+ retrieval accuracy, pgvector queries under 200ms. Models picked per task and routed through OpenRouter - GPT-4, Claude, or Gemini - with fallback when a provider degrades or prices spike.

03

Performance & Optimization

Brought per-document processing from $5.00 to $0.05 with hash-based chunk reuse. Semantic search holds under 200ms at load. 99.9% uptime.

04

Ownership

I own the whole thing: decisions, features, deploys, monitoring, and the stuff that breaks after launch. TypeScript, Next.js, Python, PostgreSQL, Redis, AWS, and Vercel.

How I work

  • Async-first - decisions documented, PRs that explain the why, not just the what.
  • Full ownership - I design, deploy, monitor, and fix; root cause over workarounds.
  • Ship continuously without breaking what's already live.
  • Reply within 24 hours to anything real.
Experience

Building & operating, end to end.

May 2022 - Present

Independent / Building for early-stage startups · Remote

Founder & AI Systems Engineer

Full ownership of system design, feature delivery, reliability, and iteration - everything here built, deployed, and maintained by me.

Owned backend services, data stores, AI pipelines, and deployment infrastructure, including authentication, payments, and third-party integrations. Debugged production incidents, performance bottlenecks, and scaling limits while shipping continuously without breaking live systems.

Focus

  • RAG & vector searchProduction retrieval on Postgres / pgvector - chunking strategies, persistent embeddings, and context-grounded synthesis to keep answers accurate.
  • Cost-efficient AICut model and processing spend 50-80% with hash-based chunk reuse, embedding caches, and multi-provider routing - quality held constant.
  • Real-time & reliableIdempotent webhooks, queue-backed jobs, retries with backoff, and graceful degradation when upstream APIs fail.
  • Observability & opsHealth checks, structured logging, Sentry + OpenTelemetry tracing, and self-healing recovery - debugged to root cause, not symptoms.

Next.js · TypeScript · Python · PostgreSQL · Redis · OpenAI · pgvector · Docker · AWS

Contact

Let's work
together.

Open to full-time remote roles at early-stage, AI-first startups. Hand me the problem and the constraints - I'll take it from design through deploy and on-call. Fully remote, flexible across US/EU hours.

Book a call