Re: Extracting ASCII text from a PDF Document

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



What happens when you run pdftotext on the file?

On Thu, 12 Aug 2010, Martin McCormick wrote:

I have a PDF document that does have embedded ASCII text in it.
It plays fine on a Macintosh that has no OCR software on it but
uses Voiceover. Voiceover just runs on ASCII so the ASCII is
there.

	I need to use the file on a Debian system so I hope I am
just using a2ps and pstotext wrong.

	if one uses pstotext on this document, it immediately
errors out. If I use a2ps and give it -o outfilename.ps, a2ps
runs but I may be producing an image file as there is no text
from the document, talk about sound and fury signifying nothing.

	If one runs pstotext on that output file, one gets a
single form feed for each page and nothing else.

	The PDF document is not protected.

	Any suggestions as to how to extract the text are
welcome. Thanks.

Martin McCormick

_______________________________________________
Blinux-list mailing list
Blinux-list@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/blinux-list


--
Kirk Reiser				The Computer Braille Facility
e-mail: kirk@xxxxxxxxxxxxxx		University of Western Ontario
phone: (519) 661-3061

_______________________________________________
Blinux-list mailing list
Blinux-list@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/blinux-list


[Index of Archives]     [Linux Speakup]     [Fedora]     [Linux Kernel]     [Yosemite News]     [Big List of Linux Books]