On June 1, 2016, Karen Lewellen wrote: > My Linux experience is rooted at shellworld which is now using > Ubuntu. I just got a PowerPoint file for a meeting, and because of > its size, I cannot use the back door method I normally tap into > for converting it into something else. > Is there a program like antiword or unrtf to convert PowerPoint at > the command line? Is it an old .ppt or a new .pptx file? There was a "ppthtml" tool around that could convert the older .ppt files to HTML in a fashion. The site hosting the source code no longer seems to be available though. If it's a newer .pptx file, it's really just a .zip file with a different extension. So you can mkdir prez mv presentation.pptx prez/presentation.pptx.zip cd prez unzip presentation.pptx.zip cd ppt/slides/ There are bunch of slide*.xml files in here which you can either edit: $EDITOR slide*.xml or strip out the XML tags: for i in {1..20} ; do sed 's/<[^>]*>//g' slide${i}.xml ; done | cat -s > output.txt where "20" is the number of slides in the presentation (which you should be able to get from the output of "ls slide*.xml | wc -l" The reason for using the "for" loop with the numbers is because the slides aren't zero-padded, meaning when it sorts the names, you'd get slide1.xml, slide10.xml, slide11.xml, slide2.xml, slide3.xml, etc. Known as lexicographical sorting, this will be hard to read. So by iterating over them in numerical order, they should make more sense. Alternatively, if you have LibreOffice installed, it should theoretically be able to do conversions. Based on my experimentation, you have to convert the .ppt[x] to PDF first: libreoffice --headless -convert-to pdf presentation.pptx and then convert that to something else. The "poppler-utils" package (at least that's what it's called in Debian) has both a pdftotext and pdftohtml utility. I recommend either plain-text: pdftotext presentation.pdf presentation.txt ${EDITOR:-vi} presentation.txt or HTML: pdftohtml presentation.pdf presentation.html lynx presentation.html I snagged a couple random PPT files off the web and tried the libreoffice method and they all came out much better than I expected (and much, much, MUCH better than the hackish attempts to extract the text as given at the top of this message). So if you have libreoffice + poppler-utils installed and can use those, that's your best bet. If you don't have them and can't get them installed, then using some of the extraction hacks above might at least get some form of the content out. Hopefully these give you some options to get at the content in the presentations. -tim (an avowed despiser of PPT files) _______________________________________________ Blinux-list mailing list Blinux-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/blinux-list