Writing · Case study · 2026 · AI Video Pipeline

CUTLINE:a director, not a template engine

One sentence in, one finished MP4 out - directed by a 12-stage pipeline, not a template engine. The pipeline commits the editorial decisions before it touches a frame.

Live cutline.cloud

Source github.com

Scroll

fig. 00 — results

0: deterministic pipeline stages
1-3 min: render per video, off the request path
0-tier: per-shot image fallback
0: creative knobs

Section 01

The problem

Short-form video has eaten attention, but the production loop has not compressed.

Script, storyboard, b-roll, cut, caption, render - every step has a tool, the assembly is still manual, and by the time the idea is on screen it has aged. The "AI video" category mostly automates the cut, not the editorial work that decides what to cut to.

The naive AI version - "type what you want, we'll generate a video" - collapses into template fill-ins. Same Ken Burns pan over the same Unsplash photo, same kinetic-type intro, same captions. The output is generic because the system is generic: it picked a layout, not a narrative. CUTLINE takes the opposite bet.

The product is "describe, receive," not "configure, render."

Section 02

Thesis

Director layer, not template engine. The pipeline commits to editorial decisions before it touches a frame.

01Director layer, not template engine
From one sentence it infers audience, goal, tone, complexity, and duration; plans a 3-5 beat narrative arc; breaks that into 8-12 shots with per-shot purpose, motion hint, and text density; writes the script aligned to shot boundaries; sources or generates the imagery; composes the MP4. The user does not pick a template, voice, or layout. The system makes those calls.
02One sentence in, no creative knobs
Optional brand kit and uploaded assets enrich the pipeline; they do not steer it. A coffee brand can upload a logo and product photos, drop in two hex colors, and set banned_phrases and required_phrases on a brand_kits row - the director still chooses the shot list and the cuts.
03Pipeline over agent
Twelve stages, each a pure function over the previous stage's output. Deterministic stage boundaries beat agent loops for debugging, retries, and per-stage cost control. When a render looks wrong, you bisect by stage. When a provider regresses, you swap one module. When token spend spikes, you isolate the stage and tighten its prompt. An agent loop hides all three.
04Worker separate from app
Rendering is CPU-heavy and runs 1-3 minutes per video. Serverless functions time out, and even when they don't, billing-by-execution is the wrong shape for long jobs. The Next.js app handles UI, API, and job enqueue; a separate BullMQ worker pipelines and renders.

Section 03

Architecture

The app enqueues; the worker renders. The app runs on Vercel; the worker runs on a long-running host (Railway / Render / Fly), sharing the same Redis.

FIG. 01

Browser

POST /api/generateone sentence

▶

Poll job2s → 15s, 30m cap

App - Vercel

Next.js APIenqueue

▶

BullMQ + Redisjob queue

Worker - long-running host

npm run workerdrains queue

▶

12-stage pipelinerender to public/temp

▶

Vercel Blobhttps MP4 URL

FIG. 01 - In the split deploy the worker's public/temp is not reachable from the Vercel app, so the finished MP4 is uploaded to Vercel Blob and the job result stores the public https URL. The client polls GET /api/generate/[jobId] with 2s → 15s backoff, capped at 30 minutes.

Section 04

The twelve stages

Each stage is a pure function with its own POST endpoint - which is what makes the whole thing debuggable, swappable, and testable.

FIG. 02

Direct

intent

▶

narrative

▶

shots

▶

script

Voice & motion

subtitles

▶

TTS

▶

subtitle refine

▶

motion

Visuals & render

asset analysis

▶

visuals

▶

image sourcing

▶

renderfinished MP4

FIG. 02 - Per-stage endpoints (/api/intent, /api/shots, /api/script, /api/images/source, /api/render, …) let you (a) bisect a bad output by replaying one stage with a saved input, (b) swap a provider for one stage, and (c) test the slow stages in isolation. Cancellation is a Redis SET read between every stage.

Section 05

The talking-character branch

When mode === talking_object the renderer detours through one of three providers depending on talkingObjectStyle and talkingRealMode.

FIG. 03

Cartoon

Google VEO@google/genai

▶

LLM-resolved subjecthumanoid fallback

Studio framing

HeyGenphoto avatar

▶

ElevenLabsvoice

Cinematic

Multi-clip VEOpresenter per chunk

▶

ffmpeg concatcrossfade

▶

Silence trimper chunk

FIG. 03 - Three modes, three different failure semantics. Cartoon and cinematic both call Google VEO; studio framing calls HeyGen with ElevenLabs. Cinematic produces multi-clip VEO with a documentary-style different-presenter-per-chunk constraint, ffmpeg concat with crossfade, and per-chunk silence trim.

Section 06

Why this is hard

01Three talking-character modes, three failure semantics
VEO's RAI filter returns a deterministic content-safety block on the generated audio - retrying the identical prompt cannot succeed. The orchestrator throws a distinct VeoContentFilteredError, the retry classifier marks it non-retryable, and the chunk loop runs an LLM reword pass that varies the narration (what RAI blocks) while keeping the visual prompt intact. After two failed rewords the job fails with an actionable message and stops burning quota.
02HeyGen Photo Avatar quota under at-least-once submissions
Lower-tier HeyGen accounts cap stored Photo Avatars at 3. The upload path keys a SHA-256 cache on image bytes so identical inputs short-circuit. On code:401028 (quota full), the orchestrator lists the account, partitions avatars into orphans vs cached, and bulk-deletes orphans oldest-first in parallel batches (concurrency 10) with LRU eviction over cached as backup.
03Idempotency without a job-state table
X-Idempotency-Key paired with an in-process Map and a per-key Promise lock serializes concurrent submissions with the same key and returns the original { jobId }; 24h retention. In-memory by design - BullMQ already owns job lifecycle, and duplicating that into Postgres creates two sources of truth.
0412-stage cancellation across a long-running async pipeline
Cancel writes a Redis SET; every stage reads it before starting work. On hit the worker throws and the same cleanup path runs as on success or failure. Cancellation is eventual, not preemptive - latency-to-cancel is bounded by the current stage's duration.
05Plan entitlement enforced at three layers
Free / Beginner / Professional / Enterprise with caps 1 / 10 / unlimited / unlimited videos per month. Pro-only features are gated by UI (badges + lock states), the API handler (before BullMQ enqueue), and the DB (user_plan_overrides + a UNIQUE on payments.stripe_checkout_session_id). A tampered request body can't bypass the handler check; a tampered URL can't bypass the DB constraint.
06Image sourcing has to never fail
A pipeline that finishes 11 stages and aborts on shot 7 is a wasted job. The per-shot fallback chain is Unsplash → DALL-E 3 → Pexels → placeholder, with the query derived per shot from intent + script via OpenRouter. The placeholder is deliberate: a render with one stock-filler shot beats a failed render every time.

Section 07

Image sourcing has to never fail

Imagery is the least reliable stage, so it has a terminal fallback. Every shot ends up with an image.

FIG. 04

Per-shot fallback chain

Unsplashfirst try

▶

DALL-E 3generate

▶

Pexelsstock

▶

Placeholderrender still completes

FIG. 04 - The query for each shot is derived from intent + script via OpenRouter; shouldRetryForImage retries 429/5xx and gives up on other 4xx. The placeholder is the conscious terminal case - a render with one stock-filler frame beats a failed render.

Section 08

Design decisions & tradeoffs

Pipeline, not LLM agent

Why: deterministic stage boundaries beat agent loops for debugging, retries, and per-stage cost control.
Tradeoff: less emergent behaviour, more handcrafted prompts per stage. Visibility over magic.

Worker on a long-running host

Why: 1-3 minute renders die in serverless timeouts, and per-execution billing is the wrong shape for long jobs.
Tradeoff: deploy is two services (Vercel app + worker) sharing one Redis.

In-memory idempotency + Redis cancellation

Why: BullMQ already owns job lifecycle; a Postgres job-state table creates two sources of truth.
Tradeoff: the idempotency cache resets on app restart - a duplicate POST with the same key within 24h after a restart creates a second job. The request ID is carried through logs for support correlation.

Placeholder as terminal image fallback

Why: every shot must have an image; a 99%-success pipeline is operationally worse than one that finishes with a filler frame.
Tradeoff: a deployment with no image API keys produces watermark-looking output. Logs make it loud; the trade is conscious.

user_plan_overrides separate from the auth table

Why: entitlement is a thin map keyed by user_id; identity is Better Auth's job. Coupling them locks billing to one auth provider.
Tradeoff: getUserPlan(userId) reads two tables - trivial query cost for an auth-provider swap that doesn't touch billing.

Section 09

Failure modes

VEO content-safety block on chunk N

Detected as VeoContentFilteredError; an LLM reword varies that chunk's narration while preserving meaning, and the chunk regenerates.
After two failed rewords it stops with an actionable message and burns no further VEO quota.

HeyGen 401028 Photo Avatar quota full

Bulk auto-cleanup partitions orphans vs cached, deletes oldest orphans in parallel batches, then retries the upload once.
If retry still hits 401028, the user-facing error is generic; the operator log carries the technical detail.

Image provider 5xx / 429

shouldRetryForImage retries transient failures with exponential backoff, then falls through Unsplash → DALL-E → Pexels → placeholder.
Render completes.

Worker process killed mid-render

Per-job temp dir is deleted on success, failure, and cancel via one shared cleanup path.
An orphan-sweep job runs every 60 minutes as a backstop; final MP4 retention is separate (default 24h).

Stripe webhook replay / out-of-order

payments.stripe_checkout_session_id has a UNIQUE constraint; verifyAndUpgrade checks status before writing user_plan_overrides.
Replays log already-processed and return 200. Double-credit is structurally impossible.

Section 10

Security model

Browser hardening

CSP with frame-ancestors none, object-src none, base-uri self, form-action self, upgrade-insecure-requests.
HSTS (2y, includeSubDomains, preload), X-Frame-Options DENY, nosniff, Referrer-Policy strict-origin-when-cross-origin - on every response.

API keys

Format clk_<48 hex> from crypto.randomBytes(24).
Stored as SHA-256 of the full key, with a 16-char prefix kept as the indexed lookup column. Plaintext returned exactly once at creation.

Stripe webhook HMAC

stripe.webhooks.constructEvent runs before any business logic.
Missing or invalid signature → 400.

SSRF guard on callbackUrl

Validates scheme (http/https only), rejects localhost/127.0.0.1 in production unless explicitly allowed.
Fire-and-forget, 5-second timeout, no retries.

Prompt-injection rejection

validateGenerateInput rejects topics matching the injection-pattern set.
Returns a field-level VALIDATION_FAILED.

Rate limiting

Redis-backed sliding window per client.
generate 5/h, upload 20/h, status 60/min.

Section 11

Testing

The pipeline's correctness lives in pure functions, so that's where the tests concentrate - with one path that exercises the real generate endpoint end to end.

01Unit coverage on the load-bearing logic
67 Vitest cases across 11 files - 65 of them deterministic unit tests - cover input validation, the retry classifier, the pipeline orchestrator, and the generate route. The same stage boundaries that make the pipeline debuggable are what make it testable.
02Opt-in integration against a live stack
Two integration tests drive the real POST /api/generate and poll job status against a running worker and Redis; they self-skip (describe.skipIf) when Redis isn't configured, so they run only when the dependency is actually present.
03The deliberate tradeoff
No end-to-end browser suite and no CI gate yet - the integration tests are run locally when the worker stack is up, and the unit layer is where regressions get caught first.

Work with me