Showspring

The Challenge

AI video tools generate clips. We produce episodes.

Tools like Veo, Kling, Higgsfield, and Firefly are remarkable at generating individual video clips. But producing a complete YouTube episode — with narrative structure, multi-character dialogue, consistent visuals, sound design, and music — still takes dozens of hours of manual stitching, editing, and rendering. Showspring eliminates that gap entirely.

v1.0 — Episode Pipeline

From idea to YouTube in 10 steps

The first thing Showspring shipped: a 10-step episode pipeline that turns The Doodle Cast from a manual production into a daily-publishable show. Each step is a route in the app, not a separate tool.

Shipped

Brainstorm → script → voice readout → location mapping → scene gen → video gen → trim → audio mix → render → publish. End-to-end on the VPS, no editor in the loop for the happy path.

The hard part

Frame-accurate timeline accumulation across 30+ clips per episode. ffmpeg drifts when you concatenate floats, and on an 8-minute episode the audio walks off the picture by the back half. The pipeline now walks the timeline as integer frames and every consumer reads from the same canonical manifest.

What broke first

YouTube publish had no idempotency. A retry after a network blip would post the same episode twice. Caught when an early episode hit the public feed in duplicate within 90 seconds; the publish step now keys off a content hash before posting.

Aftermath

The pipeline runs daily. Adding a new step inside this surface is a one-day lift. Adding a NEW production mode (Shorts, Podcast, Guest preview) costs multiple weeks — the cross-step contracts are tighter than they look. That tradeoff became visible in v1.1.

Step 01

Creative Director

Three AI personas independently brainstorm episode concepts, then debate their merits. A judge AI (Grok 4) evaluates each pitch with live web search for topical relevance and selects a winner — or you bring your own idea and let the panel validate it. v1.2 adds a Research & Debate mode where Grok 4 conducts deep web research to build fact-heavy, current scripts.

Llama 3.1 Gemma 12B Qwen 8B Grok 4 Web Search Multi-AI Debate

Creative Director - AI brainstorming with three modes

Step 02

Script Writer

Choose your LLM — Qwen3, Gemma3, Gemini, or Claude — and generate a full episode script with structured clips, character dialogue, scene descriptions, and image prompts. The writer is trained on the show bible: a living knowledge base that evolves with every episode, ensuring character consistency and avoiding repeated plotlines.

5 LLM Options Show Bible Clip Templates 30+ Clips Per Episode

Script Writer with LLM selection and template system

Step 03

Voice Readout

Every character speaks in a distinct synthesized voice. The narrator delivers a documentary cadence; Rusty speaks with deep, measured authority; Oreo is excitable and fast. Play through the full episode readout to check pacing, dialogue flow, and story structure before committing to visual production.

ElevenLabs Per-Character Voices Full Episode Playback

Script Readout with voice profiles and clip navigation

Step 04

Location Mapping

AI extracts every location from the script and maps them to clips. Build a reusable location library with reference images, visual descriptions, and default prompts. Locations carry their visual identity across episodes — the studio always looks like the studio.

Auto-Extraction Reference Images Persistent Library

Location extraction and mapping interface

Step 05

Scene Generation

Generate photorealistic images for each clip, informed by character reference sheets, location images, scene references, and scene descriptions. Every generation considers the visual context — character appearance, location lighting, camera angle — to maintain consistency across 30+ scenes. Start and end images for each clip enable smooth I2V video generation. Full image history with undo, AI-assisted editing, and Google Flow mode for iPad/PC-sourced photos.

Z-Turbo Gemini ComfyUI Start/End Images Google Flow Scene References Image History

Scene generation with reference images and visual controls

Step 06

Video Generation

Transform scene images into motion using cloud models like Google Veo or local open-source models (WAN 2.2, LTX 2.3) on an RTX 5090. The DaVinci Resolve-style timeline shows every clip with start/end frames, status badges, and a composite episode preview player that sequences all completed clips in real time.

WAN 2.2 I2V LTX 2.3 Gemini Veo RTX 5090 Episode Preview

Video timeline with episode preview and clip generation

Step 07

Trim Editor

Fine-tune every clip with frame-accurate trim points. Set in/out markers, adjust clip durations, and preview the result instantly. The trimmed timeline carries forward to the audio mix and final render.

Frame-Accurate In/Out Markers Real-Time Preview

Step 08

Audio Mix

A full multi-track audio editor with four lanes: background video audio, voice dialogue, sound effects, and music. Each track has independent volume control with keyframe automation. Generate SFX and music from text descriptions, position them on the timeline, and fine-tune the mix — all inside the browser.

4-Track Mix Keyframe Automation AI Sound Effects AI Music ElevenLabs

Multi-track audio editor with waveforms and keyframes

Step 09

Render Engine

One button, full episode render. The engine trims each clip, applies the complete audio mix with ffmpeg filter graphs, concatenates everything (including the outro), and encodes the final MP4. When the RTX 5090 is online, encoding runs on NVENC for speed; otherwise, VPS CPU fallback handles it. Output goes straight to Google Drive.

ffmpeg Compositing NVENC / H.264 Google Drive Upload HLS Streaming

Render engine with progress tracking and video player

Step 10

Publish to YouTube

Generate multiple AI thumbnails for A/B testing, write metadata with Claude, set tags and categories, then publish directly to YouTube — with real-time upload progress. The show bible automatically evolves after each published episode, learning what works for future content.

YouTube API AI Thumbnails A/B Testing Claude Metadata Show Bible Evolution

YouTube publish with AI thumbnails and A/B testing

v1.1 — Configurable LLM Models

Configurable LLM models per step

v1.1 made every creative stage independently model-pickable. Episode Ideas can run on Gemini Flash, Episode Scripts on Claude, Shorts Ideas on a local Qwen3, all in the same project. Persisted to user preferences and surfaced in a dropdown next to each step.

Shipped

Per-step model picker on every creative surface. Local pool (Qwen3:32B / Qwen3:8B / Gemma3:27B / Gemma3:12B) plus cloud (Gemini, Claude). v2.1 extended this with a routing bridge that runs admin Gemini and Claude calls through local tooling when the GPU is online.

The hard part

Mid-episode model swaps. The same project might brainstorm on cloud and script on local; the show-bible context has to load identically into both. The bridge layer normalizes input and output across providers so calling code never branches on model identity.

What broke first

Quality dropped briefly when Idea Lab defaulted to Qwen 8B and the show-bible context spilled out of the window. Output went generic. The llama-swap router now picks the largest model that fits the prompt + still fits in VRAM with whatever else is warm.

Aftermath

Cost per episode dropped substantially once brainstorming defaulted to local. Quality stayed where it was. The "live routing indicator" in the topnav was added later in v2.1 because it was no longer obvious from the bill which engine had run which step.

💡

Episode & Shorts Ideas

Choose from Gemini 2.5 Flash, Qwen3:32B, Qwen3:8B, Gemma3:27B, Gemma3:12B, or Claude. Local models run on the RTX 5090; cloud models deliver higher quality for production batches. The model selector appears directly in the Idea Lab.

📝

Episode & Shorts Scripts

Script generation supports the same model selection. Each model brings a different narrative style — Gemini excels at concise dialogue, Claude at long-form structure, and local models at rapid iteration.

🔗

Local Routing Bridge

When the local GPU is online, an LLM bridge routes admin Gemini and Claude calls through local tooling rather than the cloud APIs. Multimodal calls (image-grounded analysis like the guest-submission photo pass) route the same way. Calls fall through to the standard cloud route when the bridge is unavailable, so behavior is identical from the caller’s perspective.

🔍

Live Routing Indicator

A compact status indicator in the topnav shows whether LLM calls are routing local or cloud at a glance. Hover and a popover breaks down the actual provider and model per active callsite, so it is obvious at any moment which engine is doing which job.

v1.1 — Shorts Production Pipeline

9:16 vertical batches alongside the episode pipeline

v1.1 added a parallel pipeline for vertical short-form content. Eight shorts at once from a single theme, each with unique characters, AI-generated images and video, voice synthesis, and music — then published across five platforms with scheduled auto-publish.

Shipped

Batch creation, per-batch scheduling, multi-platform publish (YouTube Shorts, TikTok, Reels, Facebook, X). Reused the v1.0 publish surface so the calendar and analytics work didn’t need to be re-implemented.

The hard part

Per-clip image-to-video timing alignment. Each clip is independently rendered then concatenated, and clips with slightly off frame counts caused cumulative drift across the batch. Same fix as v1.0: integer-frame accumulation everywhere.

What broke first

TikTok upload failed silently for two days because the token-rotation cron didn’t fire on weekends. Errors landed in a swallowed try/catch. Added a watchdog that pings each platform’s upload endpoint hourly and emails on failure.

Aftermath

Shorts now ship more reliably than episodes (smaller surface area, fewer moving parts). Sharing the publish/calendar surface across episode + shorts saved roughly two weeks of duplication that v1.0 had warned about.

Shorts — Step 1

Idea Lab

Generate up to 8 short ideas at once from the show’s character pool. Choose the AI model (Gemini Flash, Llama, Gemma, Qwen, or Claude), assign characters per clip or let the AI decide, and feed in a YouTube research report for topical relevance. Each idea gets a title, hook, concept, and character assignment — all in one batch generation.

Configurable LLM Batch Generation Research Report Character Pool

Shorts Idea Lab with LLM model selection and character assignment

Shorts — Step 2