iiif

wellcomecollection's avatarfrom wellcomecollection

Work with IIIF manifests and the Wellcome Collection API. Use this skill to download manifests, extract metadata, or process canvases. Invoke with /iiif.

0stars🔀1forks📁View on GitHub🕐Updated Jan 9, 2026

When & Why to Use This Skill

This Claude skill provides a specialized interface for interacting with the Wellcome Collection's digital archives via the IIIF (International Image Interoperability Framework) API. It streamlines the complex process of downloading manifests, extracting structured metadata, and retrieving OCR text from millions of digitized pages, making it an essential tool for digital humanities and large-scale archival data analysis.

Use Cases

  • Digital Humanities Research: Automate the harvesting of metadata and OCR text from hundreds of thousands of historical documents for text mining and linguistic analysis.
  • Archival Data Engineering: Convert unstructured IIIF manifest data into structured Hive tables to enable advanced SQL querying and large-scale collection management.
  • Image Processing Pipelines: Programmatically extract high-resolution image URLs and dimensions from canvases for batch downloading or computer vision research.
  • Scholarly Content Discovery: Use the Wellcome Collection Works API to programmatically identify and process specific historical works based on subjects, contributors, or titles.
nameiiif
descriptionWork with IIIF manifests and the Wellcome Collection API. Use this skill to download manifests, extract metadata, or process canvases. Invoke with /iiif.

IIIF Manifest Processing

This skill helps work with IIIF (International Image Interoperability Framework) manifests from the Wellcome Collection.

Overview

The Wellcome Collection provides digitized works via IIIF Presentation API. Each work has a manifest containing:

  • Metadata (title, contributors, subjects)
  • Sequences of canvases (pages)
  • Image URLs and dimensions
  • Text renderings (OCR text URLs)

CLI Commands

The iiif_manifests.py module provides Click-based CLI commands:

Download Manifests

# Download first 1000 manifests
python -m wc_simd.iiif_manifests download-manifests --limit 1000

# Download all manifests
python -m wc_simd.iiif_manifests download-manifests

Create Hive Tables

python -m wc_simd.iiif_manifests create-tables

Process Specific Work

python -m wc_simd.iiif_manifests process-work --work-id abc123

API Endpoints

Works API

https://api.wellcomecollection.org/catalogue/v2/works
https://api.wellcomecollection.org/catalogue/v2/works/{id}

IIIF Presentation API

https://iiif.wellcomecollection.org/presentation/v3/{id}

Image API

https://iiif.wellcomecollection.org/image/{image_id}/full/max/0/default.jpg

Python Usage

from wc_simd.iiif_manifests import (
    fetch_manifest,
    extract_canvases,
    get_text_rendering_url
)

# Fetch a manifest
manifest = fetch_manifest("abc123")

# Extract canvas metadata
canvases = extract_canvases(manifest)

# Get OCR text URL
text_url = get_text_rendering_url(manifest)

Content Advisory Authentication

Some images require authentication due to content advisory. Handle with session cookies:

import requests

session = requests.Session()
# Accept content advisory
session.get("https://wellcomecollection.org/works/{id}?acceptContentAdvisory=true")
# Now fetch protected content
response = session.get(protected_url)

Key Statistics

  • Total manifests: ~340,000
  • Works with OCR text: 226,145
  • Total pages: ~42 million
  • Download failure rate: 0.47%

Hive Tables Created

Table Description
iiif_manifests Manifest metadata (id, label, attribution)
iiif_canvases Canvas data (image URLs, dimensions, labels)
alto_text OCR text extracted from text renderings