subtitle-service
Subtitle parsing, generation, and translation service guidelines for Gemini-Subtitle-Pro. Use when working with SRT/ASS/VTT parsing, subtitle generation, translation pipeline, glossary management, speaker identification, and Gemini API integration. Covers the complete transcription → translation workflow.
When & Why to Use This Skill
This Claude skill provides a comprehensive framework for managing the end-to-end subtitle processing workflow, specifically designed for the Gemini-Subtitle-Pro ecosystem. It enables seamless parsing and generation of SRT, ASS, and VTT formats while orchestrating an advanced AI-driven translation pipeline. By integrating Gemini API for refinement, glossary management, and speaker identification, it ensures high-quality, context-aware video localization and automated transcription-to-translation workflows.
Use Cases
- Automating the full transcription and translation pipeline for video content using Whisper and Gemini API integration.
- Parsing, validating, and converting subtitle files between different formats (SRT, ASS, VTT) with high precision in timestamp handling.
- Implementing glossary-based translation to maintain terminology consistency across large-scale video localization projects.
- Enhancing subtitle accuracy through AI-powered refinement, speaker identification, and automated Voice Activity Detection (VAD) segmentation.
- Developing robust subtitle processing services with built-in error handling and parallel processing for high-concurrency translation tasks.
| name | subtitle-service |
|---|---|
| description | Subtitle parsing, generation, and translation service guidelines for Gemini-Subtitle-Pro. Use when working with SRT/ASS/VTT parsing, subtitle generation, translation pipeline, glossary management, speaker identification, and Gemini API integration. Covers the complete transcription → translation workflow. |
Subtitle Service Guidelines
Purpose
Establish patterns for subtitle processing services in Gemini-Subtitle-Pro, covering parsing, generation, translation, and the AI pipeline.
When to Use This Skill
Automatically activates when working on:
- SRT/ASS/VTT parsing and generation
- Translation pipeline
- Glossary management
- Speaker identification
- Gemini API integration for refinement
- Transcription workflow
Quick Start
Subtitle Processing Checklist
- Parser: Use appropriate parser for format (SRT, ASS, VTT)
- Types: Use
SubtitleEntryinterface consistently - Validation: Validate timestamps and text content
- Error Handling: Handle parsing errors gracefully
- i18n: Support multiple languages in output
Architecture Overview
Pipeline Flow
Audio/Video Input
↓
Transcription (Whisper)
↓
Segmentation (VAD)
↓
Glossary Extraction
↓
Speaker Identification
↓
Translation/Refinement (Gemini)
↓
Subtitle Output (SRT/ASS/VTT)
Key Services
| Service | Location | Purpose |
|---|---|---|
| Subtitle Parser | src/services/subtitle/ |
Parse SRT/ASS/VTT |
| Generation Pipeline | src/services/generation/pipeline/ |
Orchestrate workflow |
| Gemini API | src/services/api/gemini/ |
Translation & refinement |
| Audio Processing | src/services/audio/ |
VAD, sampling |
Core Data Types
SubtitleEntry
interface SubtitleEntry {
index: number;
startTime: number; // milliseconds
endTime: number; // milliseconds
text: string;
translatedText?: string;
speaker?: string;
}
Timestamp Utilities
// Parse SRT timestamp: "00:01:23,456" → 83456
function parseSrtTimestamp(timestamp: string): number;
// Format to SRT: 83456 → "00:01:23,456"
function formatSrtTimestamp(ms: number): string;
// Parse ASS timestamp: "0:01:23.45" → 83450
function parseAssTimestamp(timestamp: string): number;
Parsing Patterns
SRT Parser
export function parseSrt(content: string): SubtitleEntry[] {
const blocks = content.trim().split(/\n\n+/);
return blocks.map((block, index) => {
const lines = block.split('\n');
const timestampLine = lines[1];
const [start, end] = timestampLine.split(' --> ');
return {
index,
startTime: parseSrtTimestamp(start),
endTime: parseSrtTimestamp(end),
text: lines.slice(2).join('\n'),
};
});
}
Format Detection
export function detectSubtitleFormat(content: string): 'srt' | 'ass' | 'vtt' {
if (content.startsWith('WEBVTT')) return 'vtt';
if (content.includes('[Script Info]')) return 'ass';
return 'srt';
}
Translation Pipeline
Concurrency Model
// Dual semaphores for resource management
const transcriptionSemaphore = new Semaphore(
isLocal ? 1 : 5 // Local: 1, Cloud: 5
);
const refinementSemaphore = new Semaphore(5); // Gemini Flash
// Process chunks in parallel
await mapInParallel(chunks, async (chunk) => {
await transcriptionSemaphore.acquire();
try {
const transcription = await transcribe(chunk);
// ...
} finally {
transcriptionSemaphore.release();
}
});
Resource Files
For detailed guidelines, see the resources directory:
- parsing-patterns.md - Subtitle format parsing
- pipeline-guide.md - Translation pipeline patterns
- gemini-integration.md - Gemini API usage