Re: OCR on linux

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Daniel Dalton wrote:
If I was to buy a new scanner what model is the easiest to set up and the best supported?
What one would you recommend?



Hi,

Pretty much any scanner should work nowadays. You want one that's TWAIN compatible. That includes most Epson, Cannon, HP, etc. You probably want a USB scanner. The only thing to watch is that some require their own Windows drivers which of course won't work in Linux. This seems true of HP but I had this with an Epson also. I don't yet do scanning in Linux so I can't really give specific help besides that, but if in doubt, look for something like "best scanner" or "supported scanner models" at http://www.google.com/linux

If you get one to work, I would be interested in your results. I am interested in trying to scan documents in Linux and have found the OCR thread interesting. I would also be interested in which engine produces the best text quality. I know from trying different ones under Windows that results can drastically vary depending on many factors.

You asked about page images with text. First, be aware that there are at least 4 different types of .tif images. One is compressed, one is for faxes, one is for multiple pages and one is the standard, old fashioned, single page. You want the later. You'll know that it's right because it will only support one page per document and the files will be very big, about 1 MB per file. I've had bad luck with the other .tif variations. Also, there are many sources of page images not mentioned. Just a few are these:

http://www.gutenberg.org/ now offers .jpg page images but you have to do some hunting

http://www.archive.org/ look for text, American Libraries, all should be high quality images

http://onlinebooks.library.upenn.edu/ look at serials, links to many magazines with page images but sites aren't very accessible

http://www.loc.gov/ and http://memories.loc.gov/ very comprehensive and will require searching, has many "Base Ball" guides, images are in .jpg

I hope this is helpful to you. Have a good weekend.

_______________________________________________
Blinux-list mailing list
Blinux-list@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/blinux-list

[Index of Archives]     [Linux Speakup]     [Fedora]     [Linux Kernel]     [Yosemite News]     [Big List of Linux Books]