> -----Original Message----- > From: Andrew Ballard [mailto:aballard@xxxxxxxxx] > Sent: Friday, December 05, 2008 9:11 AM > To: Jim Lucas > Cc: Shawn McKenzie; php-general@xxxxxxxxxxxxx > Subject: Re: How to fetch .DOC or .DOCX file in php > > On Thu, Dec 4, 2008 at 10:35 PM, Jim Lucas <lists@xxxxxxxxx> wrote: > > I was going to say that I haven't yet decided on what the final > output format is going to be. Probably either rtf or OpenXML. > > > > How about I ask for suggestions on what would be the best format to > store the final copy. > > > > I figured that this tool would mainly be used for .doc to web > conversion, but I guess it could be used to also convert to other > document formats too. > > > > But, I would like to have the ability to at least store the formating > inline with the text. So, either some form of xml. Be it (x)HTML or > plain XML > > or even OpenXML. > > > > A question to all then. How would you like to see the text, with > formating, stored? > > It's an excellent start. It pulled in some additional control > characters in some of the documents I tried, and some documents had > extra stuff at the end of the document. It was still text, but it > looked like the text from the page header/footer definitions. It would > be cool to see this polished and released. I just wish there was > something this basic that worked this well on PDF files! :-) Andrew, There's something to be said about inter-language operability. I've become enamored with the iText package for manipulating, creating, and extracting PDF documents and associated info/bookmarks/tags/etc. There was, for a time, an OpenSource PDF editor built with JPedal/iText that looked like it would soon compete with Acrobat for PDF fillable forms; but the author has little time to play with it. Anyway, you can setup a Java program (yes, iText is Java) to extract the text from the fields--or entire document--and spit it out however you format it (text, XML, whatev). iText - http://www.lowagie.com/iText/ PHP/Java bridge - http://php-java-bridge.sourceforge.net/pjb/ HTH, // Todd