On Sat, Aug 13, 2011 at 2:41 AM, Bob Goodwin <bobgoodwin@xxxxxxxxxxxx> wrote: > On 12/08/11 12:22, mike cloaked wrote: [...] >> However if the pdf is a scanned image then it would need ocr before >> the text could be extracted - As someone else noted, some recent scan-to-pdf tools try to pre-ocr the text. Sometimes it's sort of helpful. Sometimes not so much. Some pdf output tools actually bury the real text into the pdf as well as an image of the text. But that's not scanning. This doesn't seem to be the case, either. > I believe it is a scanned image now that I realize it has a > handwritten signature. > > Xsane does ocr. I tried scanning a printed copy and letting > xsane save it as a text message as well as trying gocr to read > an xsane .pnm file. Both produced the same output which looks > like it would require a lot of work to be usable if it is > possible at all? > > I will do without the Google translation. > > Thanks for all the suggestions. This has been interesting, I > always wondered about ocr, what it could do. I need to > experiment with a document in English so that I have something I > understand however it looks like the output quality is poor? ocr is still hit-and-miss. Some combinations of languages/fonts/scanners/image format/paper quality/ocr software and the price of 10base5 cable on Saipan work well. Others don't. Well, probably not 10base5. :/ But the tuning is sometimes so time-intensive that you'd prefer to just type it in by hand. On the other hand, if you have a lot of the scanned text that comes from the same source, the tuning can be worth it. Don't ask me how to tune the ocr. Some years ago I read up on it and decided, for that doc, I'd pass. Open source ocr seems to have progressed since then, which is nice. Joel Rees -- users mailing list users@xxxxxxxxxxxxxxxxxxxxxxx To unsubscribe or change subscription options: https://admin.fedoraproject.org/mailman/listinfo/users Guidelines: http://fedoraproject.org/wiki/Mailing_list_guidelines