Customer stories

Rescuing the Ted Owens archive: 10,000 handwritten pages

How the New Thinking Allowed Foundation transcribed a decaying ~10,000-page archive of handwritten notes, letters and annotated clippings - after eight other OCR tools failed.

Published on

When Aaron Kovalcsik took on the Ted Owens archive for the New Thinking Allowed Foundation, he was facing one of the harder digitization jobs imaginable: just under 10,000 pages of handwritten notes, letters, and newspaper clippings with handwritten marginalia, accumulated between 1966 and 1987.

The physical condition was the real problem. Many pages were faded and warped, with handwriting where the ink had bled through the paper. Some were photographs of documents that had survived water damage. Typewritten pages often had news clippings pasted on top, so a single scan could be a messy collage of typography, handwriting, and damage all at once. At that scale, transcribing by hand was never realistic.

The first attempts went the way these projects usually go. Downsampling the images, bundling them into PDFs, and running standard OCR produced inaccurate, messy results. So Aaron ran a proper bake-off, testing the archive against eight competing OCR tools. Most were barely better than Adobe. The closest he got was OpenAI’s API at around 60% accuracy - but it frequently refused to process pages, skipped entire sections, and would only transcribe content that was already cleanly typed.

Early on, Aaron had also run a few test pages through Handwriting OCR. The difference was stark: it returned readable text on faded, barely legible photos where the other tools returned nothing at all - including passages of cursive that Aaron himself couldn’t decipher.

When he showed those early results to Jeffrey Mishlove, who owns the archive, the funding to transcribe the entire collection was approved on the spot. The full collection - just under 10,000 pages - was transcribed in less than an hour. Spot-checking the output across even the most challenging pages put accuracy well above 99%. Exporting the text was painless, and the processing was fast. As Aaron put it, the hardest part of the whole project turned out to be the months spent failing with everything else first.

With the archive now fully transcribed, the collection can be opened to researchers studying the life and writings of Ted Owens - a figure named by Probe Magazine in 1977 as one of the “world’s top 40 psychics.” Material that had been locked inside fragile, fading paper is now searchable text.

Frequently asked questions

Can Handwriting OCR read water-damaged or faded handwriting?

It is built for exactly these documents. In this project the archive included faded, warped pages with ink bleed-through and photographs of water-damaged papers. Handwriting OCR returned usable text on pages where the other tools tested returned nothing. Results always depend on document quality, but degraded and historical material is the core use case rather than an edge case.

How does Handwriting OCR compare to using an LLM's vision API for OCR?

In this customer's own testing, OpenAI's API was the best of the alternatives at roughly 60% accuracy, but it frequently refused pages and skipped sections, only performing on clean, clearly typed input. For a head-to-head benchmark of nine tools including LLM vision, see our best AI handwriting OCR comparison.

Can it handle a very large archive?

Yes. Large collections are processed in batches (we recommend splitting sources into PDFs of around 100 pages each for the most reliable results). This archive ran to just under 10,000 pages.