Re: command line scanned pdf to text

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Would you mind enlarging on this if you can and have time? What kind of file did you use and what did you put in your command-line? I am asking this because I have tried to use tesseract a couple of times with tiff files and have gotten mostly gibberish so obviously I am doing something wrong. I am running debian testing if that makes a difference.

Thanks.

-- 
Cheryl

May the words of my mouth
and the meditation of my heart
be acceptable to You, Lord,
my rock and my Redeemer.
(Psalm 19:14 HCSB)





> On Nov 2, 2015, at 2:13 PM, John G Heim <jheim@xxxxxxxxxxxxx> wrote:
> 
> 
> I've been scanning in the D&D 5th Edition player's handbook. I tried every open source OCR program I could find and tesseract was easily the best. On pages that are just prose, it probably does about 99% accuracy. Even on pages where that are 2 columns of prose, it does really well if you tell it to look for that. Somebody sent me a pdf of the same book done with a professional OCR program for Windows. The results are approximately equal. Tesseract may lack the bells & whistles of commercial products but for accuracy, it's pretty good.
> 
> 
> 
> On 11/01/2015 11:24 PM, Tom Fowle wrote:
>> Am I the last to find this?
>>  command line ocr tesseract
>> won't directly support .pdf but
>> pdftocairo
>> produces .jpg among others which tesseract will read.
>> 
>> May not do well with collumns but not too bad.
>> 
>> Is there anything better?
>> 
>> Thanks
>> tom Fowle
>> _______________________________________________
>> Speakup mailing list
>> Speakup@xxxxxxxxxxxxxxxxx
>> http://linux-speakup.org/cgi-bin/mailman/listinfo/speakup
>> 
> 
> -- 
> John Heim, jheim@xxxxxxxxxxxxx, 608-263-4189, skype:john.g.heim, sip:jheim@xxxxxxxxxxxxxxxx
> _______________________________________________
> Speakup mailing list
> Speakup@xxxxxxxxxxxxxxxxx
> http://linux-speakup.org/cgi-bin/mailman/listinfo/speakup

_______________________________________________
Speakup mailing list
Speakup@xxxxxxxxxxxxxxxxx
http://linux-speakup.org/cgi-bin/mailman/listinfo/speakup




[Index of Archives]     [Linux for the Blind]     [Fedora Discussioin]     [Linux Kernel]     [Yosemite News]     [Big List of Linux Books]
  Powered by Linux