video-assembly
Assemble final video from audio, images, and subtitles
When & Why to Use This Skill
The Video Assembly skill automates the creation of professional Full HD videos by seamlessly integrating mastered audio, scene images, and VTT subtitles. It streamlines the post-production process using FFmpeg and Python scripts to handle cinematic transitions, precise subtitle synchronization, and section-based image mapping, delivering YouTube-ready MP4 files with minimal manual effort.
Use Cases
- Automated YouTube Channel Management: Efficiently transform audio-heavy content, such as guided meditations or podcasts, into visually engaging videos with synchronized captions.
- AI-Driven Content Creation: Combine AI-generated images (from Stable Diffusion or Midjourney) with synthesized voiceovers to produce high-quality video content at scale.
- Educational and Instructional Media: Rapidly assemble lecture slides and audio recordings into accessible video formats with embedded subtitles for better learner engagement.
- Social Media Asset Production: Generate polished video clips from static assets and VTT files to maintain a consistent and professional visual presence across digital platforms.
| name | Video Assembly |
|---|---|
| tier | 3 |
| load_policy | task-specific |
| description | Assemble final video from audio, images, and subtitles |
| version | 1.0.0 |
| parent_skill | production-operations |
Video Assembly Skill
The Visual Wrapper for the Audio Journey
This skill handles assembling the final video from mastered audio, scene images, and VTT subtitles.
Purpose
Create YouTube-ready video files that complement the hypnotic audio experience.
Video Standards
| Parameter | Standard |
|---|---|
| Resolution | 1920x1080 (Full HD) |
| Aspect Ratio | 16:9 |
| Codec | H.264 |
| Frame Rate | 24 fps (cinematic) |
| Audio Codec | AAC 320kbps |
| Container | MP4 |
Input Requirements
| Input | File | Required |
|---|---|---|
| Master Audio | {session}_MASTER.mp3 |
Yes |
| Scene Images | images/uploaded/*.png |
Yes |
| Subtitles | output/subtitles.vtt |
Yes |
Scene Image Generation
Primary Method: Stable Diffusion (Default)
python3 scripts/core/generate_scene_images.py sessions/{session}/
Alternative: Midjourney Prompts
python3 scripts/core/generate_scene_images.py sessions/{session}/ --midjourney-only
Alternative: Stock Images
python3 scripts/core/generate_scene_images.py sessions/{session}/ --method stock
Image Specifications
| Property | Requirement |
|---|---|
| Resolution | 1920x1080 minimum |
| Format | PNG or JPEG |
| Aspect Ratio | 16:9 |
| Naming | scene_01.png, scene_02.png, etc. |
VTT Subtitle Generation
python3 scripts/ai/vtt_generator.py sessions/{session}
VTT Format
WEBVTT
Kind: captions
Language: en
1
00:00:00.000 --> 00:00:05.500
Welcome to this healing journey.
2
00:00:06.000 --> 00:00:12.000
Find a comfortable position and
allow your eyes to close.
Subtitle Guidelines
| Guideline | Value |
|---|---|
| Max lines per caption | 2 |
| Max characters per line | ~80 |
| Min duration | 1.5 seconds |
| Max duration | 7 seconds |
Video Assembly Command
python3 scripts/core/assemble_session_video.py sessions/{session}/
This automatically:
- Sequences images based on script sections
- Adds cross-fade transitions
- Syncs subtitles to audio
- Outputs to
output/video/session_final.mp4
Manual FFmpeg Assembly
For custom control:
# Create video from images with audio
ffmpeg -y \
-framerate 1/10 \
-pattern_type glob -i 'images/uploaded/*.png' \
-i output/{session}_MASTER.mp3 \
-c:v libx264 -r 24 -pix_fmt yuv420p \
-c:a aac -b:a 320k \
-shortest \
output/video/session_final.mp4
Image-to-Section Mapping
Images should correspond to script sections:
| Image | Section | Timing |
|---|---|---|
scene_01.png |
Pre-Talk | 0:00-3:00 |
scene_02.png |
Induction | 3:00-8:00 |
scene_03.png |
Deepening | 8:00-12:00 |
scene_04.png |
Journey Start | 12:00-17:00 |
scene_05.png |
Journey Core | 17:00-22:00 |
scene_06.png |
Helm/Deepest | 22:00-25:00 |
scene_07.png |
Integration | 25:00-28:00 |
scene_08.png |
Emergence | 28:00-30:00 |
Transition Effects
| Transition | Duration | Use For |
|---|---|---|
| Cross-dissolve | 2-3 seconds | Section transitions |
| Fade from black | 3 seconds | Opening |
| Fade to black | 3 seconds | Closing |
Output Files
| File | Location | Purpose |
|---|---|---|
| Final video | output/video/session_final.mp4 |
Direct use |
| YouTube copy | output/youtube_package/final_video.mp4 |
Upload ready |
Video Overlay Generation
Generate supporting graphics:
python3 scripts/core/generate_video_images.py sessions/{session}/ --all
Creates:
title_card.png- Video intro screensections/section_*.png- Chapter transitionsoutro.png- End screensocial_preview.png- Social sharing
Quality Verification
After assembly:
# Check video properties
ffprobe -v error -show_format -show_streams output/video/session_final.mp4
# Play with VLC to verify sync
vlc output/video/session_final.mp4
Quality Checklist
- Resolution is 1920x1080
- Frame rate is 24 fps
- Audio syncs with subtitles
- Transitions are smooth
- No visible artifacts
- Duration matches audio
- File size reasonable (<2GB for 30 min)
Troubleshooting
| Issue | Cause | Solution |
|---|---|---|
| Audio/video desync | Different durations | Use -shortest flag |
| Pixelated video | Wrong pixel format | Use -pix_fmt yuv420p |
| Green frames | Image format issue | Convert images to PNG |
| Subtitle timing off | VTT not scaled | Regenerate VTT with actual audio duration |
| File too large | Bitrate too high | Use -crf 23 for smaller file |
Integration with Pipeline
Before (dependencies):
- Audio mastered (
{session}_MASTER.mp3) - Scene images ready (
images/uploaded/) - VTT subtitles generated (
output/subtitles.vtt)
After (next steps):
- YouTube packaging
Related Resources
- Skill:
tier3-production/audio-mixing/(input) - Skill:
tier3-production/youtube-packaging/(next step) - Doc:
docs/STOCK_IMAGE_SOP.md - Script:
scripts/core/assemble_session_video.py