On Tue, 2008-09-23 at 10:55 +0200, Grzegorz Kurtyka wrote: > Ashley Sheridan wrote: > > > Hi All, > > > > I recently asked a question regarding reading a PDF with PHP. I've tried > > Zend_pdf, but all this is able to give me is the number of pages in a > > PDF, and cannot extract the text from the PDF files I have. I thought > > I'd try a different method, and try to extract the text straight from > > the Word document which is used to generate the PDF. Does anyone have > > any experience with this sort of thing, or enough to suggest a library > > which is capable of this? > > > > > > Ash > > www.ashleysheridan.co.uk > > > It might be worth lokking into apps like catdoc > http://freshmeat.net/projects/catdoc/ its able to extract texts from > doc/ppt/xls files (similar to "strings" command but takes care of files > internal encoding for you). Reading README in this package might give you > some tips > > I'd actually considered doing along the lines of this myself, as the Word docs were only simple affairs, and I only wanted the text from them anyway. Ash www.ashleysheridan.co.uk -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php