Re: What is the tool for this?

Karen Lewellen <klewellen@xxxxxxxxxxxxxx> · Thu, 2 Jun 2016 23:03:54 -0400 (EDT)

Oh my goodness!
Well fortunately for me it was a simple matter  of Ken the administrator 
at shellworld to install unoconv
 I ran the program first creating a listener channel as instructed, then 
ran   unoconv on the file which created its pdf.
I have no idea weather it was new or old ppt, I did not create the thing.
Anyway once in pdf format a simple pdftotext produced  the text file.
once I found a rather terrific page on running the unoconv program the 
process
 likely took me all of 2 minutes.
I love that I could have chosen a different  format for the output, but 
between pdftotext when the file is a baby hippopotamus  as it was in this 
case, or 
robobraille for when the file is more reasonable in size I got the job 
done.
I truly honor the dedication of some, but speaking only for myself having 
to  do all those steps would keep me in another operating system for 
sure...my professional deadlines alone require swift solutions.
My thanks too goes to the person who gave me the name of the front end 
tool.  Seems very shell service friendly much like antiword and unrtf.
cheers,
Kare

On Thu, 2 Jun 2016, Tim Chase wrote:

On June  1, 2016, Karen Lewellen wrote:
My Linux experience is rooted at shellworld which is now using
Ubuntu. I just got a PowerPoint file for a meeting, and because of
its size,  I cannot use the back door method I normally tap into
for converting it into something else.
Is there a program like antiword or unrtf to convert PowerPoint at
the command line?

Is it an old .ppt or a new .pptx file?  There was a "ppthtml" tool
around that could convert the older .ppt files to HTML in a fashion.
The site hosting the source code no longer seems to be available
though.  If it's a newer .pptx file, it's really just a .zip file
with a different extension.  So you can

  mkdir prez
  mv presentation.pptx prez/presentation.pptx.zip
  cd prez
  unzip presentation.pptx.zip
  cd ppt/slides/

There are bunch of slide*.xml files in here which you can either edit:

  $EDITOR slide*.xml

or strip out the XML tags:

  for i in {1..20} ; do sed 's/<[^>]*>//g' slide${i}.xml ; done |
  cat -s > output.txt

where "20" is the number of slides in the presentation (which you
should be able to get from the output of "ls slide*.xml | wc -l"

The reason for using the "for" loop with the numbers is because the
slides aren't zero-padded, meaning when it sorts the names, you'd get
slide1.xml, slide10.xml, slide11.xml, slide2.xml, slide3.xml, etc.
Known as lexicographical sorting, this will be hard to read.  So by
iterating over them in numerical order, they should make more sense.

Alternatively, if you have LibreOffice installed, it should
theoretically be able to do conversions.  Based on my
experimentation, you have to convert the .ppt[x] to PDF first:

 libreoffice --headless -convert-to pdf presentation.pptx

and then convert that to something else.  The "poppler-utils" package
(at least that's what it's called in Debian) has both a pdftotext and
pdftohtml utility.  I recommend either plain-text:

 pdftotext presentation.pdf presentation.txt
 ${EDITOR:-vi} presentation.txt

or HTML:

 pdftohtml presentation.pdf presentation.html
 lynx presentation.html

I snagged a couple random PPT files off the web and tried the
libreoffice method and they all came out much better than I expected
(and much, much, MUCH better than the hackish attempts to extract the
text as given at the top of this message).

So if you have libreoffice + poppler-utils installed and can use
those, that's your best bet.  If you don't have them and can't get
them installed, then using some of the extraction hacks above might
at least get some form of the content out.

Hopefully these give you some options to get at the content in the
presentations.

-tim
(an avowed despiser of PPT files)

_______________________________________________
Blinux-list mailing list
Blinux-list@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/blinux-list

_______________________________________________
Blinux-list mailing list
Blinux-list@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/blinux-list