On Thu, 2012-12-13 at 18:41 +0100, Matijn Woudt wrote: > On Thu, Dec 13, 2012 at 5:13 PM, Jim Giner <jim.giner@xxxxxxxxxxxxxxxxxx>wrote: > > > On 12/13/2012 10:56 AM, Bastien wrote: > > > >> > >> > >> Bastien Koert > >> > >> On 2012-12-13, at 9:10 AM, Jim Giner <jim.giner@xxxxxxxxxxxxxxxxxx> > >> wrote: > >> > >> Thanks for the input gentlemen. Two opposing viewpoints! > >>> > >>> I understand the concept of using files for the docs and a table to > >>> locate them and id them. But I am of the opinion that modern dbs are > >>> capable of handling very large objects (of which these docs are NOT!) much > >>> easier than years ago, so I am leaning that way still. It will certainly > >>> make my search process easier! > >>> > >>> More comments anyone? > >>> > >>> -- > >>> PHP General Mailing List (http://www.php.net/) > >>> To unsubscribe, visit: http://www.php.net/unsub.php > >>> > >>> > >> I got away from storing blobs in the db. I noticed significant slowness > >> after the db grew to about 12gb in MySQL. Back ups also get affected as > >> they take longer. This was older MySQL. But it also affected my mssql > >> server the same way. > >> > >> Nowadays it's files into the file system and data into the db. One thing > >> you could consider is reading the contents of the into a db field and just > >> store the text to allow the full text search > >> > >> Bastien > >> > >> A very clever idea! I like it - the best of both worlds. Can you sum > > up a method for getting the text out of the .doc (or .rtf) files so that I > > can automate the process for my past and future documents? > > Is there a single php function that would accomplish this? > > > There's no builtin function for such stuff. doc files are quite tricky to > parse, but rtf files can be parsed pretty easily. One project is PHPRtfLite > [1], which provides you an API for doing this. > > - Matijn > > [1] http://sourceforge.net/projects/phprtf/ As well as rtf, the OpenDoc format is easy to read from PHP. Essentially it's just a bunch of XML files zipped up. Images are kept in the archive too, which is a handy way to retrieve thumbnails of docs also! Thanks, Ash http://www.ashleysheridan.co.uk