Hi I know the new microsoft docx format is an xml document, so you could
probably use the xml parser with that.
Any chance you can get them to use a rtf file instead of a word file to
convert to pdf, rtf is mostly readable text with some control words
thrown in for formatting.
clive
Ashley Sheridan wrote:
Hi All,
I recently asked a question regarding reading a PDF with PHP. I've tried
Zend_pdf, but all this is able to give me is the number of pages in a
PDF, and cannot extract the text from the PDF files I have. I thought
I'd try a different method, and try to extract the text straight from
the Word document which is used to generate the PDF. Does anyone have
any experience with this sort of thing, or enough to suggest a library
which is capable of this?
Ash
www.ashleysheridan.co.uk
--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php