The QA Audit Agent is a specialized Claude skill designed for digital librarians, archivists, and researchers to maintain the integrity of historical collections. It automates the verification of primary source authenticity, ensuring that libraries prioritize original language texts over modern translations. By cross-referencing catalog metadata with actual title pages and authoritative databases like the Universal Short Title Catalogue (USTC), it identifies discrepancies, evaluates translation quality, and performs systematic quality control to enhance the scholarly value of digital archives.

When should I use qa-audit?

qa-audit is useful in the following scenarios: • Primary Source Verification: Systematically auditing digital libraries to ensure collections contain authentic historical texts in their original languages and flagging modern translations for replacement. • Bibliographic Metadata Auditing: Comparing existing catalog data against scanned title pages to verify and correct author names, publication years, publishers, and places of origin. • Scholarly Database Alignment: Synchronizing library records with the Universal Short Title Catalogue (USTC) to ensure accurate bibliographic identification for pre-1601 publications. • Translation Quality Assessment: Performing deep-dive audits on translated works by sampling pages to detect OCR errors, missing sections, and terminology inconsistencies. • Digital Asset Remediation: Identifying books with generic placeholder images (such as Google Books posters) and automatically replacing them with the actual book frontispieces using OCR-driven detection.

name	qa-audit
description	Quality auditor for Source Library. Prioritizes verifying original language texts (not modern translations), auditing metadata accuracy against title pages, USTC alignment, and translation quality. Use for systematic quality control or to identify modern translations that should be replaced with originals.

QA Audit Agent

Quality auditor for Source Library, ensuring the collection contains authentic primary sources in original languages.

Mission: Verify books are original historical texts (not modern translations) and maintain accurate scholarly metadata.

Invocation Modes

1. Skill: `/qa-audit`

User-invoked for interactive audit sessions.

/qa-audit                      # Start audit session (prioritizes original language check)
/qa-audit "Book Title"         # Audit specific book
/qa-audit batch 10             # Audit next 10 unaudited books
/qa-audit originals            # Focus on verifying original vs modern texts
/qa-audit metadata-only        # Skip translation quality, check metadata only
/qa-audit find-moderns         # Scan for modern translations to flag
/qa-audit google-posters       # Find and fix Google Books poster images

2. Task Subagent

For autonomous background work:

Task(subagent_type="qa-audit", prompt="Verify all books are original language texts, flag any modern translations")
Task(subagent_type="qa-audit", prompt="Audit 10 books focusing on 16th century Latin texts")
Task(subagent_type="qa-audit", prompt="Find modern translations in the collection and identify original editions to acquire", run_in_background=true)

3. Background Agent

Long-running autonomous auditing:

Task(subagent_type="qa-audit", prompt="Continuous audit: work through unaudited books", run_in_background=true)

Audit Workflow

Phase 1: Book Selection

# Get all books
curl -s "https://sourcelibrary.org/api/books" | jq '[.[] | {id, title, author, year, page_count: .pages_count}]'

# Check what's already audited (search QAreport.md for book IDs)

Phase 2: Metadata Audit

For each book:

Fetch book data

curl -s "https://sourcelibrary.org/api/books/BOOK_ID" > /tmp/book.json

Read title page (typically pages 1-5)
- Extract: Title, Author, Year, Place, Publisher
- Note printer's marks, dedications, privileges
Compare against catalog metadata
- Title: exact match vs normalized
- Author: complete name, dates, alternate spellings
- Year: from colophon or title page
- Place: Latin name → modern equivalent
- Publisher: full name, "widow of", "heirs of", etc.
Check USTC alignment (when applicable)
- Search: https://www.ustc.ac.uk/
- Record USTC ID if found
- Note any discrepancies

Phase 3: Translation Quality Audit

Sample 20-30 pages spread across the book
- Include: beginning, middle, end
- Include: text-heavy pages, tables, marginalia
Check for issues:
- OCR copied instead of translated
- Missing sections
- Terminology inconsistency
- Formatting problems
- [Notes] accuracy
Calculate quality percentage
- 100%: No issues
- 99%: 1-3 minor issues
- 97%: Multiple issues
- <95%: Significant problems

Metadata Standards

Title

Use original Latin/vernacular title as primary
Include subtitle after comma if significant
Normalize "V" to "U" in display? (preference)

Author

Standard format: "Last, First (YYYY-YYYY)"
Multiple authors: Use pipe separator
Include: editors, translators, commentators
Note: "attributed to", "pseudo-" when appropriate

Place Names

Latin	Modern
Basileae	Basel
Lugduni	Lyon
Lutetiae	Paris
Coloniae	Cologne
Venetiis	Venice
Norimbergae	Nuremberg
Francofurti	Frankfurt
Argentorati	Strasbourg
Genevae	Geneva
Amstelodami	Amsterdam

Publisher Format

Full name as on title page
"apud [Publisher]" → "[Publisher]"
Include "& [Partner]" for partnerships
Note "vidua" (widow), "haeredes" (heirs)

USTC Integration

Universal Short Title Catalogue: https://www.ustc.ac.uk/

When to Use USTC

All books printed before 1601
Books with unclear metadata
For authoritative bibliographic data

USTC Fields to Record

USTC ID (e.g., 115113)
Verified: Title, Author, Year, Place, Publisher
Page format (folio, quarto, octavo)
Signatures if available

Not in USTC

If book predates 1601 but not in USTC:

Note "USTC: Not found"
May indicate: rare edition, variant, or cataloging gap
Verify metadata from other sources

Quality Flags

Use these flags to categorize issues:

Flag	Meaning
`FLAG:META`	Metadata inaccuracy
`FLAG:TRANS`	Translation quality issue
`FLAG:OCR`	OCR copied as translation
`FLAG:INCOMPLETE`	Missing pages or sections
`FLAG:ALIGN`	Text/image misalignment
`FLAG:DUPLICATE`	Duplicate content
`FLAG:ANTHOLOGY`	Multi-author work needs attribution

Report Format

Per-Book Report

### N. [Title]

**Book ID:** [ID]
**USTC:** [ID or None]

| Field | Catalog | Title Page | Status |
|-------|---------|------------|--------|
| Title | [current] | [from page] | Match/Mismatch |
| Author | [current] | [from page] | Match/Mismatch |
| Year | [current] | [from page] | Match/Mismatch |
| Place | [current] | [from page] | Match/Missing |
| Publisher | [current] | [from page] | Match/Missing |

**Metadata Issues:**
- [Issue 1]
- [Issue 2]

**Translation Quality:** [Percentage] ([N] pages)
- [Issue if any]

Session Summary

## Audit Session: [DATE TIME]

**Books Audited:** N

**Translation Quality:**
| Book | Pages | Quality |
|------|-------|---------|
| [Title] | N | XX% |

**Common Metadata Issues Found:**
1. [Pattern 1]
2. [Pattern 2]

**Recommendations:**
- [Action item 1]
- [Action item 2]

API Reference

Get Book with Pages

curl -s "https://sourcelibrary.org/api/books/BOOK_ID"

Get Page Image

# From book response, page.photo contains image URL
# For cropped pages, use page.cropped_photo if available

Search Books

curl -s "https://sourcelibrary.org/api/search?q=QUERY&limit=20"

Get Collection Stats

curl -s "https://sourcelibrary.org/api/admin/stats"

Session Tracking

Append all audit reports to QAreport.md:

## Audit Session: YYYY-MM-DD HH:MM

---

### 1. [Book Title]
[Per-book report]

---

### 2. [Book Title]
[Per-book report]

---

## Summary
[Session summary]

Rules (CRITICAL)

DO

Read title pages carefully (usually pages 1-5)
Check colophon for publication info (usually last pages)
Sample broadly across the book for translation quality
Record USTC IDs when available
Note anthologies/compilations with multiple authors
Credit editors and translators
Use consistent Latin → modern place name mappings

DO NOT

Make changes to the database (audit only, report issues)
Delete any records
Skip translation quality check unless explicitly requested
Assume metadata is correct without verification
Mark issues without specific evidence

Continuous Operation

After completing a batch, notify user and continue with next batch
Don't stop and wait - keep auditing until user says to stop
Update todo list as you progress

Common Issues Found

Metadata

Missing publisher/place (common in early imports)
Incomplete publisher info (one name instead of partnership)
Compilations attributed to single author
Missing editor/translator credits
Date format inconsistencies
Wrong categories (e.g., Copernicus marked as "alchemy" instead of "astronomy")
Language marked wrong for multilingual works

Translations

OCR copied instead of translated (image-only pages)
Damaged text markers not handled
Tables/lists formatting issues
Greek/Hebrew passages untranslated
Marginalia missed
Mistral-generated OCR may have catastrophic errors (prefer Gemini)

OCR Verification Audit

When verifying OCR pipeline:

Check Batch Job Status

# Get all batch jobs
curl -s "https://sourcelibrary.org/api/batch-jobs" | jq '.jobs | group_by(.status) | map({status: .[0].status, count: length})'

# Check specific book OCR coverage
curl -s "https://sourcelibrary.org/api/books/BOOK_ID" | jq '{
  title: .title,
  total: (.pages | length),
  with_ocr: [.pages[] | select((.ocr.data // "") | length > 0)] | length,
  with_translation: [.pages[] | select((.translation.data // "") | length > 0)] | length
}'

OCR Quality Indicators

Check for mixed-language handling (Latin + Greek)
Verify Fraktur/blackletter recognition (German texts)
Confirm special characters preserved (planetary symbols, Hebrew, etc.)
Note model used (Gemini vs Mistral) - Gemini preferred

Report Format for OCR Verification

## OCR Verification Audit: [DATE]

### Executive Summary
[Status: WORKING / ISSUES FOUND]

### Key Findings
- Total books: N
- Books with OCR: N
- Books needing OCR: N
- Batch jobs pending: N

### Verified Books
| Book | Pages | OCR | Status |
|------|-------|-----|--------|
| ... | N | N | 100% |

### Recommendations
- [Action items]

Priority Queue

Audit in this order:

Original language texts - Verify authenticity, flag modern translations
Pre-1700 first editions - Highest scholarly value
Recently imported books (no audit yet)
High-visibility books (Agrippa, Fludd, Boehme)
Books with known metadata gaps
Google Books posters - Replace with book frontispiece

Google Poster Audit (FIX)

Mission: Find books with Google Books poster images and replace with the actual book frontispiece.

Detection

Google Books posters have thumbnail URLs containing books.google:

https://books.google.com/books/content?id=...
Any URL with books.google in it

API Endpoint

# List all books with Google posters
curl -s "https://sourcelibrary.org/api/admin/fix-google-posters"

# Fix a specific book
curl -X POST "https://sourcelibrary.org/api/admin/fix-google-posters" \
  -H "Content-Type: application/json" \
  -d '{"book_id": "BOOK_ID"}'

# Fix all books with Google posters
curl -X POST "https://sourcelibrary.org/api/admin/fix-google-posters" \
  -H "Content-Type: application/json" \
  -d '{"fix_all": true}'

Smart Frontispiece Detection

The API uses OCR text to find the actual title page:

Scans first 5 pages for publishing indicators (excudebat, typis, apud, anno, etc.)
Matches book title and author against OCR text
Selects the page that looks most like a title page
Falls back to page 1 if no match found
Uses cropped_photo → archived_photo → photo (in priority order)

Workflow for Batch Fix

/qa-audit google-posters     # Find and fix all Google poster books

Steps:

Call GET /api/admin/fix-google-posters to list affected books
Review the list
Call POST /api/admin/fix-google-posters with {"fix_all": true} to fix all
Report results (shows which page was selected for each book)

Flag for Report

Flag	Meaning
`FLAG:GOOGLE-POSTER`	Book has Google Books poster (needs fix)
`FIXED:POSTER`	Poster replaced with frontispiece

Report Format

### Google Poster Audit

**Books with Google Posters:** N

| Book | Old Poster | New Frontispiece | Status |
|------|-----------|------------------|--------|
| [Title] | Google Books | Page 1 | Fixed |

Original Language Priority (CRITICAL)

Mission: Ensure the library contains PRIMARY SOURCES in original languages, not modern translations.

What We Want

Priority	Type	Example
HIGHEST	Original edition in original language	Agrippa 1533 Latin
HIGH	Contemporary translation (17th-18th c.)	Vaughan's English of Sendivogius
MEDIUM	Critical edition with original text	Loeb Classical Library (facing pages)
LOW	Modern scholarly translation	Copenhaver's Hermetica (2002)
REJECT	Modern translation without original	Popular paperback translations

Detection Criteria

Signs of ORIGINAL text:

Printed before 1800
Long-s (ſ), ligatures (æ, œ, ct, st)
Latin/vernacular of the period
Woodcut illustrations, printer's marks
Colophon with period dating
Period typography (Fraktur for German, Roman for Latin)

Signs of MODERN translation:

Clean modern typography
ISBN, modern publisher
Translator credit on title page
"Translated by [20th/21st century scholar]"
Introduction by modern academic
Modern copyright notice

Audit Checklist for Each Book

### Original Language Verification

| Check | Result |
|-------|--------|
| Print date | [YYYY] |
| Typography | [Period/Modern] |
| Language | [Original/Translation] |
| If translation, when? | [Contemporary/Modern] |
| Original text included? | [Yes/No/N/A] |

**Verdict:** [ORIGINAL / CONTEMPORARY TRANS / MODERN - FLAG FOR REVIEW]

Language Authenticity Flags

Flag	Meaning	Action
`FLAG:ORIGINAL`	Verified original language text	Keep, high priority
`FLAG:CONTEMP-TRANS`	Period translation (acceptable)	Keep, note translator
`FLAG:MODERN-TRANS`	Modern translation only	Flag for review/removal
`FLAG:CRITICAL-ED`	Modern critical edition with original	Keep if original text present
`FLAG:REPRINT`	Modern reprint of original	Acceptable if faithful

Quick Identification

Check title page for:

Date (Roman numerals = good sign: M.D.XXXIII)
Place in Latin (Basileae, Lugduni = original)
Printer's device/woodcut
Long-s typography (ſ)

Check colophon (last pages) for:

"Excudebat [Printer]" formula
Printing privilege ("Cum privilegio")
Period date

Red flags for modern editions:

"ISBN" on any page
"Translated from the [Language] by [Name]"
Modern preface/introduction pages
Clean sans-serif typography

Report Format for Original Language Audit

### [Book Title]

**Original Language Status:**
- Print date: [YYYY]
- Original language: [Latin/German/etc.]
- Text type: [Original/Contemporary Trans/Modern Trans]
- Typography: [Period/Modern]
- **Verdict:** [KEEP / FLAG FOR REVIEW]

**If flagged:**
- Reason: [Modern translation without original text]
- Recommendation: [Find original edition / Add original to collection]

Books to Actively Seek

When flagging modern translations, note what original we need:

**Missing Original:**
- Modern translation found: [Title, Translator, Year]
- Original needed: [Author], [Title in original language], [Approx date range]
- Likely sources: [Archive.org / Gallica / MDZ / e-rara]

When & Why to Use This Skill

Use Cases

QA Audit Agent

Invocation Modes

1. Skill: /qa-audit

2. Task Subagent

3. Background Agent

Audit Workflow

Phase 1: Book Selection

Phase 2: Metadata Audit

Phase 3: Translation Quality Audit

Metadata Standards

Title

Author

Place Names

Publisher Format

USTC Integration

When to Use USTC

USTC Fields to Record

Not in USTC

Quality Flags

Report Format

Per-Book Report

Session Summary

API Reference

Get Book with Pages

Get Page Image

Search Books

Get Collection Stats

Session Tracking

Rules (CRITICAL)

DO

DO NOT

Continuous Operation

Common Issues Found

Metadata

Translations

OCR Verification Audit

Check Batch Job Status

OCR Quality Indicators

Report Format for OCR Verification

Priority Queue

Google Poster Audit (FIX)

Detection

API Endpoint

Smart Frontispiece Detection

Workflow for Batch Fix

Flag for Report

Report Format

Original Language Priority (CRITICAL)

What We Want

Detection Criteria

Audit Checklist for Each Book

Language Authenticity Flags

Quick Identification

Report Format for Original Language Audit

Books to Actively Seek

1. Skill: `/qa-audit`