What is video-assembly?

The Video Assembly skill automates the creation of professional Full HD videos by seamlessly integrating mastered audio, scene images, and VTT subtitles. It streamlines the post-production process using FFmpeg and Python scripts to handle cinematic transitions, precise subtitle synchronization, and section-based image mapping, delivering YouTube-ready MP4 files with minimal manual effort.

When should I use video-assembly?

video-assembly is useful in the following scenarios: • Automated YouTube Channel Management: Efficiently transform audio-heavy content, such as guided meditations or podcasts, into visually engaging videos with synchronized captions. • AI-Driven Content Creation: Combine AI-generated images (from Stable Diffusion or Midjourney) with synthesized voiceovers to produce high-quality video content at scale. • Educational and Instructional Media: Rapidly assemble lecture slides and audio recordings into accessible video formats with embedded subtitles for better learner engagement. • Social Media Asset Production: Generate polished video clips from static assets and VTT files to maintain a consistent and professional visual presence across digital platforms.

name	Video Assembly
tier	3
load_policy	task-specific
description	Assemble final video from audio, images, and subtitles
version	1.0.0
parent_skill	production-operations

Video Assembly Skill

The Visual Wrapper for the Audio Journey

This skill handles assembling the final video from mastered audio, scene images, and VTT subtitles.

Purpose

Create YouTube-ready video files that complement the hypnotic audio experience.

Video Standards

Parameter	Standard
Resolution	1920x1080 (Full HD)
Aspect Ratio	16:9
Codec	H.264
Frame Rate	24 fps (cinematic)
Audio Codec	AAC 320kbps
Container	MP4

Input Requirements

Input	File	Required
Master Audio	`{session}_MASTER.mp3`	Yes
Scene Images	`images/uploaded/*.png`	Yes
Subtitles	`output/subtitles.vtt`	Yes

Scene Image Generation

Primary Method: Stable Diffusion (Default)

python3 scripts/core/generate_scene_images.py sessions/{session}/

Alternative: Midjourney Prompts

python3 scripts/core/generate_scene_images.py sessions/{session}/ --midjourney-only

Alternative: Stock Images

python3 scripts/core/generate_scene_images.py sessions/{session}/ --method stock

Image Specifications

Property	Requirement
Resolution	1920x1080 minimum
Format	PNG or JPEG
Aspect Ratio	16:9
Naming	`scene_01.png`, `scene_02.png`, etc.

VTT Subtitle Generation

python3 scripts/ai/vtt_generator.py sessions/{session}

VTT Format

WEBVTT
Kind: captions
Language: en

1
00:00:00.000 --> 00:00:05.500
Welcome to this healing journey.

2
00:00:06.000 --> 00:00:12.000
Find a comfortable position and
allow your eyes to close.

Subtitle Guidelines

Guideline	Value
Max lines per caption	2
Max characters per line	~80
Min duration	1.5 seconds
Max duration	7 seconds

Video Assembly Command

python3 scripts/core/assemble_session_video.py sessions/{session}/

This automatically:

Sequences images based on script sections
Adds cross-fade transitions
Syncs subtitles to audio
Outputs to output/video/session_final.mp4

Manual FFmpeg Assembly

For custom control:

# Create video from images with audio
ffmpeg -y \
  -framerate 1/10 \
  -pattern_type glob -i 'images/uploaded/*.png' \
  -i output/{session}_MASTER.mp3 \
  -c:v libx264 -r 24 -pix_fmt yuv420p \
  -c:a aac -b:a 320k \
  -shortest \
  output/video/session_final.mp4

Image-to-Section Mapping

Images should correspond to script sections:

Image	Section	Timing
`scene_01.png`	Pre-Talk	0:00-3:00
`scene_02.png`	Induction	3:00-8:00
`scene_03.png`	Deepening	8:00-12:00
`scene_04.png`	Journey Start	12:00-17:00
`scene_05.png`	Journey Core	17:00-22:00
`scene_06.png`	Helm/Deepest	22:00-25:00
`scene_07.png`	Integration	25:00-28:00
`scene_08.png`	Emergence	28:00-30:00

Transition Effects

Transition	Duration	Use For
Cross-dissolve	2-3 seconds	Section transitions
Fade from black	3 seconds	Opening
Fade to black	3 seconds	Closing

Output Files

File	Location	Purpose
Final video	`output/video/session_final.mp4`	Direct use
YouTube copy	`output/youtube_package/final_video.mp4`	Upload ready

Video Overlay Generation

Generate supporting graphics:

python3 scripts/core/generate_video_images.py sessions/{session}/ --all

Creates:

title_card.png - Video intro screen
sections/section_*.png - Chapter transitions
outro.png - End screen
social_preview.png - Social sharing

Quality Verification

After assembly:

# Check video properties
ffprobe -v error -show_format -show_streams output/video/session_final.mp4

# Play with VLC to verify sync
vlc output/video/session_final.mp4

Quality Checklist

Resolution is 1920x1080
Frame rate is 24 fps
Audio syncs with subtitles
Transitions are smooth
No visible artifacts
Duration matches audio
File size reasonable (<2GB for 30 min)

Troubleshooting

Issue	Cause	Solution
Audio/video desync	Different durations	Use `-shortest` flag
Pixelated video	Wrong pixel format	Use `-pix_fmt yuv420p`
Green frames	Image format issue	Convert images to PNG
Subtitle timing off	VTT not scaled	Regenerate VTT with actual audio duration
File too large	Bitrate too high	Use `-crf 23` for smaller file

Integration with Pipeline

Before (dependencies):

Audio mastered ({session}_MASTER.mp3)
Scene images ready (images/uploaded/)
VTT subtitles generated (output/subtitles.vtt)

After (next steps):

YouTube packaging

Related Resources

Skill: tier3-production/audio-mixing/ (input)
Skill: tier3-production/youtube-packaging/ (next step)
Doc: docs/STOCK_IMAGE_SOP.md
Script: scripts/core/assemble_session_video.py

video-assembly

When & Why to Use This Skill

Use Cases