iiif
Work with IIIF manifests and the Wellcome Collection API. Use this skill to download manifests, extract metadata, or process canvases. Invoke with /iiif.
When & Why to Use This Skill
This Claude skill provides a specialized interface for interacting with the Wellcome Collection's digital archives via the IIIF (International Image Interoperability Framework) API. It streamlines the complex process of downloading manifests, extracting structured metadata, and retrieving OCR text from millions of digitized pages, making it an essential tool for digital humanities and large-scale archival data analysis.
Use Cases
- Digital Humanities Research: Automate the harvesting of metadata and OCR text from hundreds of thousands of historical documents for text mining and linguistic analysis.
- Archival Data Engineering: Convert unstructured IIIF manifest data into structured Hive tables to enable advanced SQL querying and large-scale collection management.
- Image Processing Pipelines: Programmatically extract high-resolution image URLs and dimensions from canvases for batch downloading or computer vision research.
- Scholarly Content Discovery: Use the Wellcome Collection Works API to programmatically identify and process specific historical works based on subjects, contributors, or titles.
| name | iiif |
|---|---|
| description | Work with IIIF manifests and the Wellcome Collection API. Use this skill to download manifests, extract metadata, or process canvases. Invoke with /iiif. |
IIIF Manifest Processing
This skill helps work with IIIF (International Image Interoperability Framework) manifests from the Wellcome Collection.
Overview
The Wellcome Collection provides digitized works via IIIF Presentation API. Each work has a manifest containing:
- Metadata (title, contributors, subjects)
- Sequences of canvases (pages)
- Image URLs and dimensions
- Text renderings (OCR text URLs)
CLI Commands
The iiif_manifests.py module provides Click-based CLI commands:
Download Manifests
# Download first 1000 manifests
python -m wc_simd.iiif_manifests download-manifests --limit 1000
# Download all manifests
python -m wc_simd.iiif_manifests download-manifests
Create Hive Tables
python -m wc_simd.iiif_manifests create-tables
Process Specific Work
python -m wc_simd.iiif_manifests process-work --work-id abc123
API Endpoints
Works API
https://api.wellcomecollection.org/catalogue/v2/works
https://api.wellcomecollection.org/catalogue/v2/works/{id}
IIIF Presentation API
https://iiif.wellcomecollection.org/presentation/v3/{id}
Image API
https://iiif.wellcomecollection.org/image/{image_id}/full/max/0/default.jpg
Python Usage
from wc_simd.iiif_manifests import (
fetch_manifest,
extract_canvases,
get_text_rendering_url
)
# Fetch a manifest
manifest = fetch_manifest("abc123")
# Extract canvas metadata
canvases = extract_canvases(manifest)
# Get OCR text URL
text_url = get_text_rendering_url(manifest)
Content Advisory Authentication
Some images require authentication due to content advisory. Handle with session cookies:
import requests
session = requests.Session()
# Accept content advisory
session.get("https://wellcomecollection.org/works/{id}?acceptContentAdvisory=true")
# Now fetch protected content
response = session.get(protected_url)
Key Statistics
- Total manifests: ~340,000
- Works with OCR text: 226,145
- Total pages: ~42 million
- Download failure rate: 0.47%
Hive Tables Created
| Table | Description |
|---|---|
iiif_manifests |
Manifest metadata (id, label, attribution) |
iiif_canvases |
Canvas data (image URLs, dimensions, labels) |
alto_text |
OCR text extracted from text renderings |