Re: storing & searching docs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, 2012-12-13 at 18:41 +0100, Matijn Woudt wrote:

> On Thu, Dec 13, 2012 at 5:13 PM, Jim Giner <jim.giner@xxxxxxxxxxxxxxxxxx>wrote:
> 
> > On 12/13/2012 10:56 AM, Bastien wrote:
> >
> >>
> >>
> >> Bastien Koert
> >>
> >> On 2012-12-13, at 9:10 AM, Jim Giner <jim.giner@xxxxxxxxxxxxxxxxxx>
> >> wrote:
> >>
> >>  Thanks for the input gentlemen.  Two opposing viewpoints!
> >>>
> >>> I understand the concept of using files for the docs and a table to
> >>> locate them and id them.  But I am of the opinion that modern dbs are
> >>> capable of handling very large objects (of which these docs are NOT!) much
> >>> easier than years ago, so I am leaning that way still.  It will certainly
> >>> make my search process easier!
> >>>
> >>> More comments anyone?
> >>>
> >>> --
> >>> PHP General Mailing List (http://www.php.net/)
> >>> To unsubscribe, visit: http://www.php.net/unsub.php
> >>>
> >>>
> >> I got away from storing blobs in the db. I noticed significant slowness
> >> after the db grew to about 12gb in MySQL. Back ups also get affected as
> >> they take longer. This was older MySQL. But it also affected my mssql
> >> server the same way.
> >>
> >> Nowadays it's files into the file system and data into the db. One thing
> >> you could consider is reading the contents of the into a db field and just
> >> store the text to allow the full text search
> >>
> >> Bastien
> >>
> >>  A very clever idea!  I like it - the best of both worlds.  Can you sum
> > up a method for getting the text out of the .doc (or .rtf) files so that I
> > can automate the process for my past and future documents?
> > Is there a single php function that would accomplish this?
> 
> 
> There's no builtin function for such stuff. doc files are quite tricky to
> parse, but rtf files can be parsed pretty easily. One project is PHPRtfLite
> [1], which provides you an API for doing this.
> 
> - Matijn
> 
> [1] http://sourceforge.net/projects/phprtf/


As well as rtf, the OpenDoc format is easy to read from PHP. Essentially
it's just a bunch of XML files zipped up. Images are kept in the archive
too, which is a handy way to retrieve thumbnails of docs also!

Thanks,
Ash
http://www.ashleysheridan.co.uk



[Index of Archives]     [PHP Home]     [Apache Users]     [PHP on Windows]     [Kernel Newbies]     [PHP Install]     [PHP Classes]     [Pear]     [Postgresql]     [Postgresql PHP]     [PHP on Windows]     [PHP Database Programming]     [PHP SOAP]

  Powered by Linux