comic-ocr

OCR comic strips on macOS using Apple's Vision framework, then clean up the transcripts with Claude.

Two scripts, meant to be run in sequence:

ocr.swift — Extracts text from a directory of comic strip images using Apple Vision's highest-accuracy recognition mode. Outputs a JSON file with one entry per image.
clean.py — Sends the raw OCR through Claude (via claude -p) to fix the usual OCR mess: garbled characters, ALL-CAPS normalization, metadata stripping, dialogue structure cleanup. Processes in parallel batches and writes results incrementally.

Requirements

macOS (uses Apple's Vision framework — no cloud OCR, everything runs locally)
Swift (comes with Xcode or Xcode Command Line Tools)
Python 3
Claude Code CLI (claude on your PATH)

Usage

Step 1: OCR

# Compile (one-time)
swiftc ocr.swift -o ocr -framework Vision -framework AppKit

# Run
./ocr /path/to/strips transcripts.json

Progress is logged to ocr_progress.log in the output directory. You can tail -f it.

The output JSON looks like:

[
  {
    "filename": "0.gif",
    "text": "SOMETHING POSITIVE R*K NICHOLLAND\nKNIFE GOES\nIN...\nOKAY, DAVAN, WE\nOUT FOR METE..."
  },
  {
    "filename": "1.gif",
    "text": "..."
  }
]

Raw OCR is noisy — comic strips typically come through as ALL-CAPS with the title, copyright, and URL baked into every image. The text may have broken words and garbled characters.

Step 2: Clean up with Claude

python3 clean.py transcripts.json transcripts_clean.json

Or just python3 clean.py if your files are named transcripts.json (output defaults to transcripts_clean.json).

This sends batches of 50 strips at a time to Claude Haiku, 4 batches in parallel. Progress is logged to transcripts_clean.log. Results are written to the output file incrementally as each batch completes.

The cleanup fixes:

Strips comic title headers, author bylines, copyright notices, URLs
Corrects OCR character-recognition errors using context
Converts ALL-CAPS to sentence case (preserving emphasis)
Marks non-dialogue text: [Sound effect: CRASH], [Sign: "Joe's Bar"], [Caption: Later that evening...]
Preserves dialogue structure (one speech bubble per line)
Handles guest strips and announcements gracefully
Leaves truncated text as-is with an ellipsis

Response parsing is resilient — it matches cleaned lines back to filenames by lookup rather than requiring exact line-count matches, so partial responses are always usable. Any strips that don't get a match fall back to the raw OCR.

Example

Raw OCR for a Something Positive strip:

SOMETHING POSITIVE R*K NICHOLLAND
KNIFE GOES
IN...
OKAY, DAVAN, WE
OUT FOR METE
IN FIVE MUTES
WANNA TALK ABOUT
LESTERDA•
...
82001-2002RKMLHOLLAND
BUCKS YOU
BORROWED

After cleanup:

Knife goes in...
Okay, Davan, we're out for meat in five minutes.
Wanna talk about yesterday?
...
Bucks you borrowed.

Customization

Both scripts are simple and meant to be modified:

Batch size / parallelism: BATCH_SIZE and PARALLEL in clean.py
Model: Change --model haiku in call_claude() to use a different Claude model
System prompt: The SYSTEM_PROMPT in clean.py is generic to comic strips — edit it if your source material has specific conventions
Image formats: ocr.swift handles png, gif, jpg, jpeg, webp, tiff, and bmp

License

MIT

obra/comic-ocr

README