On 5/25/09, Marius Gedminas <marius@xxxxxx> wrote: > On Mon, May 25, 2009 at 12:33:42PM -0600, Martin Collins wrote: > > I think you mean pdftohtml. At least, on my Ubuntu system poppler-utils > has pdftotext and pdftohtml, without any actual digits in the name. Yes, you're right. >> mail the pdf to your gmail account, view in HTML, then save. > > How well does this work in practice? Not very. You get an mht file that looks somewhat like the pdf minus any images. All the elements use absolute positioning so it may not wrap too well. You need to use the basic HTML version of gmail to get the 'view in HTML' option. The standard 'view' just gives you an image of the pdf. > I'm unhappy with the results I get from pdftotext: it even loses > paragraph breaks. pdftohtml, which I never tried before, is a bit > better, but it considers every line to be a separate paragraph. What you get from any of these methods will depend to a large extent on the pdf and how it was created. Some manual intervention will usually be necessary to get optimal results: With something like a novel take the converted HTML, run it through tidy then in a good text editor search and replace the bold and italic tags (and any other formatting you want to save) to some non-HTML but equivalent construct. Strip the remaining HTML with htmltotext or similar. Then in the editor replace your formatting constructs with HTML, add in valid headers etc. and you're done. You can fix the split paragraphs in vim by recording keystroke macros to join any line beginning with [a-z] to the one above, and any line ending in [,a-z] to the one below. It sounds involved but once you have a process each book only takes a few minutes unless the pdf is really borked. With some knowledge of sed, awk, perl and/or vim the process can be largely automated. BTW, I just discovered evince will rotate too. The option is under the edit menu for some reason... Martin _______________________________________________ maemo-users mailing list maemo-users@xxxxxxxxx https://lists.maemo.org/mailman/listinfo/maemo-users