What is batch-ingest?

This Claude skill automates the high-volume ingestion of financial documents, specifically bank statements, by processing them in manageable batches. It features robust progress tracking, checkpointing, and resumable workflows to ensure data integrity and efficiency when handling 50+ PDFs, making it ideal for large-scale financial data migration and automated bookkeeping.

When should I use batch-ingest?

batch-ingest is useful in the following scenarios: • Bulk Financial Data Entry: Automatically process and categorize hundreds of monthly bank statements into a central database for accounting and tax preparation. • Audit and Compliance: Efficiently ingest years of historical financial records with granular progress tracking to ensure 100% data coverage and auditability. • Resumable Document Workflows: Manage large-scale PDF processing tasks that can be paused and resumed across different sessions without duplicating work or losing progress. • Error-Tolerant Data Extraction: Handle high-volume document ingestion where individual file errors are isolated, allowing the rest of the batch to complete while providing a clear report for manual review.

name	batch-ingest
description	Process large volumes of bank statements (50+ PDFs) in batches with checkpointing and progress tracking. Orchestrates the standard ingestion skill across multiple batches for resumable processing.

Batch Ingestion Skill

Purpose

Process large volumes of bank statements (50+ PDFs) in manageable batches with:

Progress tracking via TodoWrite
User verification between batches
Resume capability using database ingestion_log
Error recovery without losing progress

When to Use

Processing 50+ PDFs at once
Want to review progress between batches
Need ability to pause and resume
Want checkpointing in case of errors

Prerequisites

Database initialized (uv run python scripts/init_db.py)
PDFs in staging folder (data/statements/staging/)
Dashboard stopped (docker compose down)

Workflow

Phase 1: Discover and Plan

Count PDFs in staging:

ls data/statements/staging/*.pdf | wc -l

Ask user for batch size:

Recommend 10-15 PDFs per batch
Smaller batches = more checkpoints
Larger batches = faster but less granular

Create TodoWrite plan:

total_pdfs = 73  # from ls count
batch_size = 15
num_batches = (total_pdfs + batch_size - 1) // batch_size

todos = [
    {"content": f"Process batch {i+1} of {num_batches} ({batch_size} PDFs)",
     "status": "pending",
     "activeForm": f"Processing batch {i+1}"}
    for i in range(num_batches)
]
# Add final todo
todos.append({
    "content": "Restart dashboard after completion",
    "status": "pending",
    "activeForm": "Restarting dashboard"
})

# Use TodoWrite tool to create the plan

Phase 2: Process Batches

For each batch:

Mark batch as in_progress using TodoWrite
Get next batch of PDFs:

ls data/statements/staging/*.pdf | head -n 15

Run ingestion skill on this batch:

Use the standard skills/ingestion/SKILL.md
Process all PDFs in the batch
The ingestion skill will handle:
- PDF reading and parsing
- Database insertion
- Categorization
- Archiving
- Error handling

Mark batch as completed using TodoWrite
Show progress summary:

from src.database.models import Database
from src.config import get_config

config = get_config()
db = Database(config["database_path"])

# Get most recent ingestion log
log = db.get_last_ingestion_log()
if log:
    print(f"✓ Batch completed")
    print(f"  PDFs processed: {log.pdfs_processed}")
    print(f"  Transactions added: {log.transactions_added}")
    print(f"  Status: {log.status}")

Ask user to continue:

Batch X of Y completed. Continue to next batch? (yes/no)

If user says no: Stop and remind them they can resume later.

Phase 3: Resume Capability

To resume an interrupted batch ingestion:

Check remaining PDFs:

ls data/statements/staging/*.pdf | wc -l

Check TodoWrite list to see which batches are pending
Continue from where you left off - just process the remaining batches

Error Handling

If a PDF fails during batch processing:

The ingestion skill will log the error
Continue with remaining PDFs in batch
Summarize failed PDFs at end of batch
User can review and retry failed PDFs separately

Progress Tracking

The ingestion skill automatically updates the ingestion_log table:

Tracks PDFs processed
Tracks transactions added/updated
Records errors
Stores summary

Query progress:

from src.database.models import Database

db = Database("/path/to/finance.db")

# Get all ingestion runs
conn = db.get_connection()
logs = conn.execute("""
    SELECT started_at, completed_at, status, pdfs_processed, transactions_added
    FROM ingestion_log
    ORDER BY started_at DESC
    LIMIT 10
""").fetchall()

for log in logs:
    print(f"{log[0]}: {log[3]} PDFs, {log[4]} transactions ({log[2]})")

Advantages Over Single Run

Checkpoint progress - Resume anytime
User control - Review between batches
Memory management - Process in chunks
Error isolation - One bad PDF doesn't stop everything

Example Session

User: "I have 73 PDFs to process. Run batch ingestion."

Claude: I'll process them in batches of 15 PDFs each (5 batches total).

[Creates TodoWrite plan with 5 batch todos]

Processing batch 1 of 5...
[Runs ingestion skill on first 15 PDFs]
✓ Batch 1 completed: 15 PDFs, 287 transactions

Continue to batch 2?

User: yes

Processing batch 2 of 5...
[Continues...]

Notes

Uses the standard skills/ingestion/SKILL.md for actual processing
No complex Python scripts required
All state tracked in database + TodoWrite
Can stop and resume anytime
Simpler and more maintainable than old batch system

batch-ingest

When & Why to Use This Skill

Use Cases