OCR still sucks

I bought a Canon 8400F to scan some photos for the Chatfield project and to attempt to accelerate the transcription of his Civil War letters. The photos are fine. No big deal there. A scanner is a scanner and I use the “import” function through Photoshop to grab the pictures, crop, save as .psd and .jpg.

Getting the thing to recognize simple typescript is another matter altogether. While I think it is faster than a 100 wpm transcription exercise, the typescript is so messy due to strikeouts and the preservation of Chatfield’s bad spelling that I spend a good five or ten minutes per page cleaning things up in an interface that isn’t too intuitive to play with. I’m using Omnipage SE Ver. 2.

Author: David Churbuck

Cape Codder with an itch to write

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Discover more from Churbuck.com

Subscribe now to keep reading and get access to the full archive.

Continue reading