Yes, but unless I'm badly mistaken, it is very old and doesn't support
directly extracting images from pdf files. You would still need to
install the xpdf package to get the pdfimages utility so you can process
the images as single files. I read about the OCR package you describe
but I'm fairly sure it's old and unmaintained. Maybe someone was going
to take over development, I'm not sure. I've noticed that most pdf
files are text and don't have page images, or if they do, the images are
pictures so would be useless anyway. Also, what is the accuracy rate
for this OCR package? What about accessibility?
Matt Barnes wrote:
Tesseract is an OCR and can convert pdf's and images to text. I
haven't gotten around to installing it and trying it out, but it seems
like the OCR of choice, located here:
http://sourceforge.net/project/showfiles.php?group_id=158586
_______________________________________________
Blinux-list mailing list
Blinux-list@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/blinux-list