If you have pdf files with actual characters, the pdftotext tool works well for extracting the text (though not necessarily the layout). As far as doing OCR from actual image files, I always found tesseract to work better than most (but it was still pretty feeble). -- users mailing list users@xxxxxxxxxxxxxxxxxxxxxxx To unsubscribe or change subscription options: https://admin.fedoraproject.org/mailman/listinfo/users Fedora Code of Conduct: http://fedoraproject.org/code-of-conduct Guidelines: http://fedoraproject.org/wiki/Mailing_list_guidelines Have a question? Ask away: http://ask.fedoraproject.org