OCR still sucks

I bought a Canon 8400F to scan some photos for the Chatfield project and to attempt to accelerate the transcription of his Civil War letters. The photos are fine. No big deal there. A scanner is a scanner and I use the “import” function through Photoshop to grab the pictures, crop, save as .psd and .jpg.

Getting the thing to recognize simple typescript is another matter altogether. While I think it is faster than a 100 wpm transcription exercise, the typescript is so messy due to strikeouts and the preservation of Chatfield’s bad spelling that I spend a good five or ten minutes per page cleaning things up in an interface that isn’t too intuitive to play with. I’m using Omnipage SE Ver. 2.

Author: David Churbuck

Cape Codder with an itch to write

Leave a Reply Cancel reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Subscribe to Blog via Email

Enter your email address to subscribe to this blog and receive notifications of new posts by email.

Join 1,864 other subscribers
Exit mobile version
%%footer%%