Re: pdf reading?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 5/25/09, Marius Gedminas <marius@xxxxxx> wrote:
> On Mon, May 25, 2009 at 12:33:42PM -0600, Martin Collins wrote:
>
> I think you mean pdftohtml.  At least, on my Ubuntu system poppler-utils
> has pdftotext and pdftohtml, without any actual digits in the name.

Yes, you're right.

>> mail the pdf to your gmail account, view in HTML, then save.
>
> How well does this work in practice?

Not very. You get an mht file that looks somewhat like the pdf minus any images.
All the elements use absolute positioning so it may not wrap too well.
You need to use the basic HTML version of gmail to get the 'view in
HTML' option. The standard 'view' just gives you an image of the pdf.

> I'm unhappy with the results I get from pdftotext: it even loses
> paragraph breaks.  pdftohtml, which I never tried before, is a bit
> better, but it considers every line to be a separate paragraph.

What you get from any of these methods will depend to a large extent
on the pdf and how it was created. Some manual intervention will
usually be necessary to get optimal results: With something like a
novel take the converted HTML, run it through tidy then in a good text
editor search and replace the bold and italic tags (and any other
formatting you want to save) to some non-HTML but equivalent
construct. Strip the remaining HTML with htmltotext or similar. Then
in the editor replace your formatting constructs with HTML, add in
valid headers etc. and you're done.

You can fix the split paragraphs in vim by recording keystroke macros
to join any line beginning with [a-z] to the one above, and any line
ending in [,a-z] to the one below.

It sounds involved but once you have a process each book only takes a
few minutes unless the pdf is really borked. With some knowledge of
sed, awk, perl and/or vim the process can be largely automated.

BTW, I just discovered evince will rotate too. The option is under the
edit menu for some reason...

Martin
_______________________________________________
maemo-users mailing list
maemo-users@xxxxxxxxx
https://lists.maemo.org/mailman/listinfo/maemo-users

[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [XFree86]     [Big List of Linux Books]    

  Powered by Linux