sense
sense - Diagrammatic Video Extraction with Subtitle Alignment
When & Why to Use This Skill
Sense is an advanced Claude skill designed for diagrammatic video extraction and subtitle alignment. It transforms unstructured video lectures into structured, queryable knowledge bases by integrating subtitle parsing, high-precision OCR for diagrams and equations (converting them to LaTeX), and automated indexing via DuckDB. This skill is optimized for researchers and students who need to convert visual educational content into searchable, digital formats.
Use Cases
- Academic Research: Automatically extract complex LaTeX equations and diagrams from recorded technical seminars to build a searchable research repository.
- Study Guide Generation: Convert video lectures into structured notes with timestamped transcripts and perfectly formatted mathematical notation for exam preparation.
- Technical Video Indexing: Create a queryable database of corporate training or software walkthroughs, allowing users to jump to specific visual content or spoken topics instantly.
- Knowledge Management: Transform massive video archives into a structured knowledge base using GF(3)-balanced skill mapping to categorize content by technical difficulty and topic.
| name | sense |
|---|---|
| description | sense - Diagrammatic Video Extraction with Subtitle Alignment |
| version | 1.0.0 |
sense - Diagrammatic Video Extraction with Subtitle Alignment
Trit: 0 (ERGODIC - Coordinator)
Extract structured knowledge from video lectures via subtitle parsing, diagram/equation OCR, and GF(3)-balanced skill mapping.
Overview
sense transforms video lectures into indexed, queryable knowledge:
┌─────────────────────────────────────────────────────────────────┐
│ VIDEO INPUT │
│ • Lecture recording (.mkv, .mp4) │
│ • Subtitles (.vtt, .srt, auto-generated) │
│ • Slides/diagrams (extracted frames) │
└──────────────────────────┬──────────────────────────────────────┘
│
┌───────────────┼───────────────┐
│ │ │
┌──────▼──────┐ ┌──────▼──────┐ ┌──────▼──────┐
│ SUBTITLE │ │ DIAGRAM │ │ SKILL │
│ PARSER │ │ EXTRACTOR │ │ MAPPER │
│ (-1 BLUE) │ │ (0 GREEN) │ │ (+1 RED) │
└──────┬──────┘ └──────┬──────┘ └──────┬──────┘
│ │ │
│ Mathpix OCR │
│ frame → LaTeX │
│ │
┌──────▼───────────────▼───────────────▼──────┐
│ DuckDB INDEX │
│ • Timestamped transcript │
│ • Extracted equations (LaTeX) │
│ • Skill mappings with GF(3) trits │
│ • Queryable views │
└──────────────────────────────────────────────┘
Triadic Structure
| Role | Component | Trit | Function |
|---|---|---|---|
| Validator | Subtitle Parser | -1 | Parse VTT/SRT, segment by timestamp |
| Coordinator | Diagram Extractor | 0 | OCR frames → LaTeX via Mathpix |
| Generator | Skill Mapper | +1 | Assign skills with GF(3) balance |
GF(3) Conservation: (-1) + (0) + (+1) = 0 ✓
Components
1. Subtitle Parser (-1)
Parses WebVTT or SRT subtitle files into structured segments:
require 'webvtt'
class SubtitleParser
def initialize(vtt_path)
@vtt = WebVTT.read(vtt_path)
end
def segments
@vtt.cues.map do |cue|
{
start: cue.start.total_seconds,
end: cue.end.total_seconds,
text: cue.text.gsub(/<[^>]*>/, '').strip,
duration: cue.end.total_seconds - cue.start.total_seconds
}
end
end
def by_slide(slide_timestamps)
# Group subtitles by slide boundaries
slide_timestamps.map.with_index do |ts, i|
next_ts = slide_timestamps[i + 1] || Float::INFINITY
{
slide: i,
timestamp: ts,
text: segments.select { |s| s[:start] >= ts && s[:start] < next_ts }
.map { |s| s[:text] }.join(' ')
}
end
end
end
2. Diagram Extractor (0)
Extracts frames at key timestamps and OCRs equations/diagrams:
require 'mathpix'
class DiagramExtractor
MATHPIX_APP_ID = ENV['MATHPIX_APP_ID']
MATHPIX_APP_KEY = ENV['MATHPIX_APP_KEY']
def initialize(video_path)
@video = video_path
end
def extract_frame(timestamp, output_path)
# Use ffmpeg to extract frame
system("ffmpeg -y -ss #{timestamp} -i '#{@video}' -vframes 1 -q:v 2 '#{output_path}'")
output_path
end
def ocr_frame(image_path)
# Send to Mathpix for LaTeX extraction
response = Mathpix.process(
src: "data:image/png;base64,#{Base64.encode64(File.read(image_path))}",
formats: ['latex_styled', 'text'],
data_options: { include_asciimath: true }
)
{
latex: response['latex_styled'],
text: response['text'],
confidence: response['confidence'],
has_diagram: response['is_printed'] || response['is_handwritten']
}
end
def extract_all(timestamps)
timestamps.map.with_index do |ts, i|
frame_path = "/tmp/frame_#{i}_#{ts.to_i}.png"
extract_frame(ts, frame_path)
result = ocr_frame(frame_path)
result.merge(timestamp: ts, slide_num: i)
end
end
end
3. Skill Mapper (+1)
Maps extracted content to skills with GF(3) conservation:
class SkillMapper
SKILL_KEYWORDS = {
'acsets' => %w[acset c-set schema functor category],
'sheaf-cohomology' => %w[sheaf cohomology local global section],
'structured-decomp' => %w[tree decomposition treewidth bag],
'kan-extensions' => %w[kan extension adjoint limit colimit],
'polynomial' => %w[polynomial poly interface arena],
'temporal-coalgebra' => %w[temporal time varying dynamic coalgebra],
'operad-compose' => %w[operad wiring diagram composition],
}
SKILL_TRITS = {
'acsets' => 0, 'sheaf-cohomology' => -1, 'structured-decomp' => -1,
'kan-extensions' => 0, 'polynomial' => 0, 'temporal-coalgebra' => -1,
'operad-compose' => +1, 'oapply-colimit' => +1, 'gay-mcp' => +1,
}
def map_content(text, latex)
combined = "#{text} #{latex}".downcase
skills = SKILL_KEYWORDS.select do |skill, keywords|
keywords.any? { |kw| combined.include?(kw) }
end.keys
# Ensure GF(3) balance
balance_skills(skills)
end
def balance_skills(skills)
trit_sum = skills.sum { |s| SKILL_TRITS[s] || 0 }
# Add balancing skills if needed
case trit_sum % 3
when 1 # Need -1
skills << 'sheaf-cohomology' unless skills.include?('sheaf-cohomology')
when 2 # Need +1 (equivalent to -1 mod 3)
skills << 'operad-compose' unless skills.include?('operad-compose')
end
skills
end
end
Complete Pipeline
class Sense
def initialize(video_path, vtt_path, output_db: 'tensor_skill_paper.duckdb')
@video = video_path
@vtt = vtt_path
@db_path = output_db
@content_id = File.basename(video_path, '.*')
@subtitle_parser = SubtitleParser.new(vtt_path)
@diagram_extractor = DiagramExtractor.new(video_path)
@skill_mapper = SkillMapper.new
end
def process!
# 1. Parse subtitles
segments = @subtitle_parser.segments
# 2. Detect slide transitions (silence gaps or visual changes)
slide_timestamps = detect_slides(segments)
# 3. Extract and OCR key frames
diagrams = @diagram_extractor.extract_all(slide_timestamps)
# 4. Map skills with GF(3) balance
indexed = diagrams.map do |d|
subtitle_text = @subtitle_parser.by_slide(slide_timestamps)[d[:slide_num]][:text]
skills = @skill_mapper.map_content(subtitle_text, d[:latex] || '')
d.merge(
subtitle_text: subtitle_text,
skills: skills,
trit: skills.sum { |s| SkillMapper::SKILL_TRITS[s] || 0 } % 3
)
end
# 5. Store in DuckDB
store_index(indexed)
# 6. Create views
create_views
indexed
end
private
def detect_slides(segments)
# Simple: gap > 2s indicates slide change
timestamps = [0.0]
segments.each_cons(2) do |a, b|
if b[:start] - a[:end] > 2.0
timestamps << b[:start]
end
end
timestamps
end
def store_index(indexed)
conn = DuckDB::Database.open(@db_path).connect
conn.execute("DROP TABLE IF EXISTS #{@content_id}_sense_index")
conn.execute(<<~SQL)
CREATE TABLE #{@content_id}_sense_index (
slide_num INTEGER,
timestamp FLOAT,
latex VARCHAR,
has_diagram BOOLEAN,
subtitle_text TEXT,
skills TEXT,
trit INTEGER,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
)
SQL
indexed.each do |row|
conn.execute(<<~SQL, [
row[:slide_num], row[:timestamp], row[:latex],
row[:has_diagram], row[:subtitle_text],
row[:skills].to_json, row[:trit]
])
INSERT INTO #{@content_id}_sense_index VALUES (?, ?, ?, ?, ?, ?, ?, CURRENT_TIMESTAMP)
SQL
end
conn.close
end
def create_views
conn = DuckDB::Database.open(@db_path).connect
conn.execute(<<~SQL)
CREATE OR REPLACE VIEW v_#{@content_id}_timeline AS
SELECT
slide_num,
printf('%02d:%05.2f', CAST(timestamp/60 AS INT), timestamp % 60) as timecode,
CASE WHEN has_diagram THEN '📊' ELSE '' END ||
CASE WHEN latex != '' AND latex IS NOT NULL THEN '📐' ELSE '' END as content,
trit,
skills
FROM #{@content_id}_sense_index
ORDER BY timestamp
SQL
conn.close
end
end
Usage
Ruby
require_relative 'lib/sense'
# Process a video lecture
sense = Sense.new(
'reference/videos/bumpus_ct2021.mkv',
'reference/videos/bumpus_ct2021.en.vtt'
)
indexed = sense.process!
puts "Indexed #{indexed.size} slides"
Command Line
# Extract subtitles from video (if not available)
uvx yt-dlp --write-auto-sub --sub-lang en --skip-download \
-o 'reference/videos/%(id)s' 'https://youtube.com/watch?v=VIDEO_ID'
# Run sense extraction
just sense-extract reference/videos/bumpus_ct2021.mkv
# Query the index
just sense-timeline bumpus_ct2021
just sense-skills bumpus_ct2021 acsets
Python Alternative
#!/usr/bin/env python3
"""sense.py - Python implementation of diagrammatic video extraction"""
import duckdb
import webvtt
import subprocess
import json
from pathlib import Path
class Sense:
def __init__(self, video_path: str, vtt_path: str, db_path: str = "tensor_skill_paper.duckdb"):
self.video = Path(video_path)
self.vtt = Path(vtt_path)
self.db_path = db_path
self.content_id = self.video.stem
def parse_subtitles(self):
"""Parse VTT file into segments"""
captions = webvtt.read(str(self.vtt))
return [
{
'start': self._time_to_seconds(c.start),
'end': self._time_to_seconds(c.end),
'text': c.text.strip()
}
for c in captions
]
def extract_frame(self, timestamp: float, output_path: str):
"""Extract single frame at timestamp"""
subprocess.run([
'ffmpeg', '-y', '-ss', str(timestamp),
'-i', str(self.video), '-vframes', '1',
'-q:v', '2', output_path
], capture_output=True)
return output_path
def ocr_frame_mathpix(self, image_path: str):
"""OCR frame using mathpix-gem"""
# Shell out to Ruby mathpix-gem
result = subprocess.run([
'ruby', '-rmathpix', '-e',
f"puts Mathpix.process_image('{image_path}').to_json"
], capture_output=True, text=True)
if result.returncode == 0:
return json.loads(result.stdout)
return {'latex': '', 'text': '', 'has_diagram': False}
def _time_to_seconds(self, time_str: str) -> float:
"""Convert HH:MM:SS.mmm to seconds"""
parts = time_str.split(':')
return int(parts[0]) * 3600 + int(parts[1]) * 60 + float(parts[2])
Justfile Commands
# Extract and index a video lecture
sense-extract video:
@echo "👁️ SENSE: Extracting {{video}}"
ruby -I lib -r sense -e "Sense.new('{{video}}', '{{video}}'.sub('.mkv', '.en.vtt')).process!"
# Download subtitles for a YouTube video
sense-subtitles url output:
uvx yt-dlp --write-auto-sub --sub-lang en --skip-download -o '{{output}}' '{{url}}'
# Show timeline for indexed content
sense-timeline content_id:
@source .venv/bin/activate && duckdb tensor_skill_paper.duckdb \
"SELECT * FROM v_{{content_id}}_timeline"
# Find slides mentioning a skill
sense-skills content_id skill:
@source .venv/bin/activate && duckdb tensor_skill_paper.duckdb \
"SELECT slide_num, timecode, skills FROM v_{{content_id}}_timeline WHERE skills LIKE '%{{skill}}%'"
# Extract frame at timestamp
sense-frame video timestamp:
flox activate -- ffmpeg -y -ss {{timestamp}} -i '{{video}}' -vframes 1 -q:v 2 /tmp/sense_frame.png
@echo "✓ Frame extracted to /tmp/sense_frame.png"
# OCR a frame with Mathpix
sense-ocr image:
ruby -rmathpix -e "puts Mathpix.process_image('{{image}}').to_json" | jq .
# Full pipeline: download, extract, index
sense-full url content_id:
@echo "📥 Downloading video and subtitles..."
uvx yt-dlp -o 'reference/videos/{{content_id}}.mkv' '{{url}}'
uvx yt-dlp --write-auto-sub --sub-lang en --skip-download -o 'reference/videos/{{content_id}}' '{{url}}'
@echo "👁️ Running sense extraction..."
just sense-extract 'reference/videos/{{content_id}}.mkv'
GF(3) Conservation
The skill ensures every indexed slide has a balanced trit sum:
-- Verify GF(3) balance
SELECT
content_id,
SUM(trit) as total_trit,
SUM(trit) % 3 as gf3,
CASE WHEN SUM(trit) % 3 = 0 THEN '✓' ELSE '✗' END as balanced
FROM sense_index
GROUP BY content_id;
Integration with Galois Infrastructure
After sense extracts content, register it in the Galois connection:
-- Update content_registry
UPDATE content_registry
SET indexed = TRUE,
index_table = 'bumpus_ct2021_sense_index'
WHERE content_id = 'bumpus_ct2021';
-- Content now flows through Galois lattice
SELECT * FROM v_galois_content_to_skills WHERE content_id = 'bumpus_ct2021';
Dependencies
# Ruby gems
gems:
- webvtt-ruby # VTT parsing
- mathpix # Mathpix OCR API
- duckdb # Database storage
# System tools
tools:
- ffmpeg # Frame extraction
- yt-dlp # Video/subtitle download
# Environment variables
env:
MATHPIX_APP_ID: "your-app-id"
MATHPIX_APP_KEY: "your-app-key"
Triads Using sense
# sense as coordinator in extraction triads:
subtitle-parser (-1) ⊗ sense (0) ⊗ skill-mapper (+1) = 0 ✓
# Combined with other skills:
sheaf-cohomology (-1) ⊗ sense (0) ⊗ gay-mcp (+1) = 0 ✓ [Colored diagrams]
temporal-coalgebra (-1) ⊗ sense (0) ⊗ koopman-generator (+1) = 0 ✓ [Dynamics]
persistent-homology (-1) ⊗ sense (0) ⊗ topos-generate (+1) = 0 ✓ [Topology]
See Also
mathpix-ocr- LaTeX extraction backendgalois-infrastructure- Content ⇆ Skills ⇆ Worldsparallel-fanout- Triadic parallel dispatchduckdb-temporal-versioning- Time-travel queries- Cat# treatment examples:
complete_catsharp_index.py,complete_bumpus_index.py
Tsao Visual Hierarchy Integration
Sense is maximally informed by Doris Tsao's visual neuroscience. See DORIS_TSAO_VISUAL_NEUROSCIENCE_BRIDGE.md.
Tsao Hierarchy → Sense Components
| Tsao Level | Visual Region | Sense Component | Function |
|---|---|---|---|
| Level 0 | V1 simple cells | Subtitle Parser (-1) | Edge detection, timestamp boundaries |
| Level 1 | V2/V4 complex | Diagram Extractor (0) | Feature integration, OCR |
| Level 2 | IT face patches | Skill Mapper (+1) | Pattern recognition, skill assignment |
| Level 3 | Prefrontal | GF(3) Balancer | Behavioral goal, conservation |
Self-Avoiding Walks via Self-Coloring
From chromatic-walk insight: SAWs don't intersect by definition, but in an effective topos we verify through self-coloring:
def saw_verified_by_self_coloring(walk: list) -> bool:
"""
In effective topos, self-intersection is decidable.
The reafference equation:
Generate(seed, i) = Observe(seed, i) ⟺ self ≡ self
If walk revisits (seed, index), it generates the SAME color
at two walk positions — contradiction detected.
"""
colors = [Gay.color_at(step.seed, step.index) for step in walk]
return len(colors) == len(set(colors)) # No repeated colors ⟺ SAW
Connection to Frontier Lab Circuits
Sense extraction parallels mechanistic interpretability:
| Sense | Circuits Research | Tsao |
|---|---|---|
| Subtitle segments | Attention heads | V1 edges |
| Diagram features | Activation patterns | V2 shapes |
| Skill mapping | Circuit identification | IT patches |
| GF(3) balance | Superposition control | Prefrontal |
See: FRONTIER_LAB_CIRCUITS_INTERACTOME.md
Chang-Tsao 50D Face Space → Skill Space
Face Space (Tsao):
25 shape axes + 25 appearance axes = 50D
Each neuron encodes ONE axis
Population decodes via linear combination
Skill Space (Sense):
N skills with trit assignments (-1, 0, +1)
Each slide maps to skill subset
GF(3) conservation ensures balance
Phenomenal Topology
Sense extraction states map to QRI's Symmetry Theory of Valence:
| State | Visual Cortex | Sense Extraction | GF(3) |
|---|---|---|---|
| Smooth | All levels coherent | Clean skill mapping | = 0 |
| Defect | Prediction error | Ambiguous slide | ≠ 0 |
| Vortex | High entropy | Multiple skill conflicts | ≫ 0 |
Rebalancing
def rebalance_defect(slide_skills: list, target_gf3: int = 0) -> list:
"""Restore GF(3) = 0 by adding compensating skills."""
current_sum = sum(SKILL_TRITS[s] for s in slide_skills)
deficit = (target_gf3 - current_sum) % 3
if deficit == 1:
slide_skills.append('sheaf-cohomology') # -1
elif deficit == 2:
slide_skills.append('operad-compose') # +1
return slide_skills
Skill Name: sense
Trit: 0 (ERGODIC - Coordinator)
Tsao Integration: V1→V2→IT→Prefrontal hierarchy
SAW Verification: Effective topos self-coloring
Scientific Skill Interleaving
This skill connects to the K-Dense-AI/claude-scientific-skills ecosystem:
Graph Theory
- networkx [○] via bicomodule
- Universal graph hub
Bibliography References
general: 734 citations in bib.duckdb
Cat# Integration
This skill maps to Cat# = Comod(P) as a bicomodule in the equipment structure:
Trit: 0 (ERGODIC)
Home: Prof
Poly Op: ⊗
Kan Role: Adj
Color: #26D826
GF(3) Naturality
The skill participates in triads satisfying:
(-1) + (0) + (+1) ≡ 0 (mod 3)
This ensures compositional coherence in the Cat# equipment structure.