OCR still sucks

I bought a Canon 8400F to scan some photos for the Chatfield project and to attempt to accelerate the transcription of his Civil War letters. The photos are fine. No big deal there. A scanner is a scanner and I use the “import” function through Photoshop to grab the pictures, crop, save as .psd and .jpg.

Getting the thing to recognize simple typescript is another matter altogether. While I think it is faster than a 100 wpm transcription exercise, the typescript is so messy due to strikeouts and the preservation of Chatfield’s bad spelling that I spend a good five or ten minutes per page cleaning things up in an interface that isn’t too intuitive to play with. I’m using Omnipage SE Ver. 2.

Author: David Churbuck

Cape Codder with an itch to write

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: