OCR comic strips on macOS using Apple's Vision framework, then clean up the transcripts with Claude.
Two scripts, meant to be run in sequence:
-
ocr.swift— Extracts text from a directory of comic strip images using Apple Vision's highest-accuracy recognition mode. Outputs a JSON file with one entry per image. -
clean.py— Sends the raw OCR through Claude (viaclaude -p) to fix the usual OCR mess: garbled characters, ALL-CAPS normalization, metadata stripping, dialogue structure cleanup. Processes in parallel batches and writes results incrementally.
- macOS (uses Apple's Vision framework — no cloud OCR, everything runs locally)
- Swift (comes with Xcode or Xcode Command Line Tools)
- Python 3
- Claude Code CLI (
claudeon your PATH)
# Compile (one-time)
swiftc ocr.swift -o ocr -framework Vision -framework AppKit
# Run
./ocr /path/to/strips transcripts.jsonProgress is logged to ocr_progress.log in the output directory. You can tail -f it.
The output JSON looks like:
[
{
"filename": "0.gif",
"text": "SOMETHING POSITIVE R*K NICHOLLAND\nKNIFE GOES\nIN...\nOKAY, DAVAN, WE\nOUT FOR METE..."
},
{
"filename": "1.gif",
"text": "..."
}
]Raw OCR is noisy — comic strips typically come through as ALL-CAPS with the title, copyright, and URL baked into every image. The text may have broken words and garbled characters.
python3 clean.py transcripts.json transcripts_clean.jsonOr just python3 clean.py if your files are named transcripts.json (output defaults to transcripts_clean.json).
This sends batches of 50 strips at a time to Claude Haiku, 4 batches in parallel. Progress is logged to transcripts_clean.log. Results are written to the output file incrementally as each batch completes.
The cleanup fixes:
- Strips comic title headers, author bylines, copyright notices, URLs
- Corrects OCR character-recognition errors using context
- Converts ALL-CAPS to sentence case (preserving emphasis)
- Marks non-dialogue text:
[Sound effect: CRASH],[Sign: "Joe's Bar"],[Caption: Later that evening...] - Preserves dialogue structure (one speech bubble per line)
- Handles guest strips and announcements gracefully
- Leaves truncated text as-is with an ellipsis
Response parsing is resilient — it matches cleaned lines back to filenames by lookup rather than requiring exact line-count matches, so partial responses are always usable. Any strips that don't get a match fall back to the raw OCR.
Raw OCR for a Something Positive strip:
SOMETHING POSITIVE R*K NICHOLLAND
KNIFE GOES
IN...
OKAY, DAVAN, WE
OUT FOR METE
IN FIVE MUTES
WANNA TALK ABOUT
LESTERDA•
...
82001-2002RKMLHOLLAND
BUCKS YOU
BORROWED
After cleanup:
Knife goes in...
Okay, Davan, we're out for meat in five minutes.
Wanna talk about yesterday?
...
Bucks you borrowed.
Both scripts are simple and meant to be modified:
- Batch size / parallelism:
BATCH_SIZEandPARALLELinclean.py - Model: Change
--model haikuincall_claude()to use a different Claude model - System prompt: The
SYSTEM_PROMPTinclean.pyis generic to comic strips — edit it if your source material has specific conventions - Image formats:
ocr.swifthandles png, gif, jpg, jpeg, webp, tiff, and bmp
MIT