Re: quick unrtf question?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

I'm not a Bookshare member as I'm not in the USA. But if what I've seen is a typical representation of Bookshare books, it's trivial to convert these to HTML.

Assuming your book has the daisyTransform.xsl file included with it, and it's probably easy enough to get hold of if it doesn't, you can use xsltproc to convert it like so:

xsltproc -o <outputfile.html> daisyTransform.xsl <inputfile.xml>

When I first saw this thread, I was wondering if you were wanting to convert Word 2007/2010 docx files. These files are really zip files with an XML document and a bunch of related files.

There is a transform called docs2html.xsl (don't remember if that's just what I called it or if it was originally called this) which you can use with xsltproc and unzip to convert docx files to html.

A search for docx2html xsl will turn up a bunch of results, and I'm of course happy to send the XSL to anyone who wants it.

I have a one-line shell script that takes the docx file as an argument and produces an HTML file with the same basename. The line of code is:

unzip -p "$1" word/document.xml |xsltproc -o "`basename "$1" .docx`.html" docx2html.xsl -

Note that this assumes a document created by Microsoft Word. Word always calls the XML file word/document.xml but there's no reason for it to be called this and apparently some other software packages use different names.

finally, while installing xsltproc on this box just now to verify all this, I also noticed a Debian package called xmlto which is apparently a front-end to xsltproc and such that's meant to take some of the work out of all this. I've not tried it though.

HTH,
Geoff.

_______________________________________________
Blinux-list mailing list
Blinux-list@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/blinux-list




[Index of Archives]     [Linux Speakup]     [Fedora]     [Linux Kernel]     [Yosemite News]     [Big List of Linux Books]