HI folks. When I use pdftotext from poppler-utils I use the -layout argument to get the resulting text file to match the page layout as closely as possible to the PDF file. This means that lines such as line1col1 line1col2 line1col3 line2col1 line2col2 line3col3 are output as such. However, when I use tesseract to extract text from PDF files that don't have embedded text I can't seem to get the same effect. Am I missing something with tesseract, or is that an alternative OCR that can give me what I want? -- users mailing list users@xxxxxxxxxxxxxxxxxxxxxxx To unsubscribe or change subscription options: https://admin.fedoraproject.org/mailman/listinfo/users Fedora Code of Conduct: http://fedoraproject.org/code-of-conduct Guidelines: http://fedoraproject.org/wiki/Mailing_list_guidelines Have a question? Ask away: http://ask.fedoraproject.org