Re: pdf documents

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Yes, but unless I'm badly mistaken, it is very old and doesn't support directly extracting images from pdf files. You would still need to install the xpdf package to get the pdfimages utility so you can process the images as single files. I read about the OCR package you describe but I'm fairly sure it's old and unmaintained. Maybe someone was going to take over development, I'm not sure. I've noticed that most pdf files are text and don't have page images, or if they do, the images are pictures so would be useless anyway. Also, what is the accuracy rate for this OCR package? What about accessibility?

Matt Barnes wrote:
Tesseract is an OCR and can convert pdf's and images to text. I haven't gotten around to installing it and trying it out, but it seems like the OCR of choice, located here:
http://sourceforge.net/project/showfiles.php?group_id=158586

_______________________________________________
Blinux-list mailing list
Blinux-list@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/blinux-list

[Index of Archives]     [Linux Speakup]     [Fedora]     [Linux Kernel]     [Yosemite News]     [Big List of Linux Books]