On 12/08/11 18:25, Cameron Simpson wrote: > On 12Aug2011 12:09, Bob Goodwin<bobgoodwin@xxxxxxxxxxxx> wrote: > | On 12/08/11 12:04, Genes MailLists wrote: > |> On 08/12/2011 11:58 AM, Bob Goodwin wrote: > |>> On 12/08/11 11:22, Genes MailLists wrote: > |>>> On 08/12/2011 11:16 AM, Madhav Ancha wrote: > |>>> You could try this fedora app: pdftotext > |>>> > |>> As can be seen I tried several combinations, thought perhaps it > |>> couldn't handle the file nam in quotes "Couier etc" but nothing > |>> seems to do it? > |>> > |> Is it possible the PDF contains an image of the text rather than text > |> itself ? > | > | I'm not sure, how would I tell? It's an attachment to an html > | cover letter. The Fedora default app, disolays it with no > | complaints. > > Is it ridiculously large for the amount of text? Does it seem to have > scanner artifacts in the text - "graininess" if you peer closely, fuzzy > text instead of perfectly formed letters (i.e. a picture of text instead > of text rendered by your computer from a font)? > > Personally I use pdftohtml to convert PDFs (then an HTML-to-text > pipeline on the end of that). Possibly pdftotext does exactly that > anyway. Of course it achieves nothing for me if the PDF is a scan. > > Cheers, It's a scan. pdftohtml seems to have produced jpeg as well as html files. -rw-rw-r--. 1 bobg bobg 321444 Aug 12 18:37 Courier-1_1.jpg -rw-rw-r--. 1 bobg bobg 309493 Aug 12 18:37 Courier-2_1.jpg -rw-rw-r--. 1 bobg bobg 461 Aug 12 18:37 Courier.html -rw-rw-r--. 1 bobg bobg 244 Aug 12 18:37 Courier_ind.html The html files display as a couple of boxes, the jpegs are sharp reproductions of the text and can be converted with gocr to text. But the quality of that text leaves much to be desired. I might be able to work it over with a dictionary to fill in the missing words, missing being words that appear as gibberish. Thanks, I'll have a go at that later. Bob -- users mailing list users@xxxxxxxxxxxxxxxxxxxxxxx To unsubscribe or change subscription options: https://admin.fedoraproject.org/mailman/listinfo/users Guidelines: http://fedoraproject.org/wiki/Mailing_list_guidelines