Re: Re: New search related question

Robert Cummings <robert@xxxxxxxxxxxxx> · Mon, 04 Feb 2008 15:42:46 -0500

I thought he was extracting the words form the content... maybe just
using strip_tags(). Doing that and pushing to a fulltext field would
cover most of his bases.

Cheers,
Rob.

On Mon, 2008-02-04 at 14:37 -0600, Shawn McKenzie wrote:
> Inefficient, maybe.  Lazy, most likely yes.
> 
> I agree that htdig may be a better solution, however his current
> solution requires upkeep if the static HTML is changed and requires that
> the person populating the database pick all relevant words from the page
> and if new ones are added to update the db.
> 
> For example, if you add the entry for the fakeFlowers.html and don't
> think it's important to add "long lasting" to the db, even though it
> appears on the page, then that search comes up empty.  Also, if the site
> owner adds a new page or just updates the Flowers.html to include
> "roses", then the db needs to be updated for that page or a new record
> added for the new page, etc.
> 
> Unless, by FULLTEXT, you're implying that the full text of each page
> should be in the db, then I would argue that there is negligible diff
> between that and the grep.  Then the only major diff is the
> maintainability, which the grep wins.
> 
> -Shawn
> 
> Robert Cummings wrote:
> > On Mon, 2008-02-04 at 14:13 -0600, Shawn McKenzie wrote:
> >> If there aren't many files and you don't intend to grow this site much
> >> larger and intend to always have static HTML, any easy implementation
> >> would be to read each file and search for the terms either in the
> >> keywords tag or in the entire file.
> >>
> >> Optionally, if you're on a *nix host you could exec() a grep for the
> >> terms which returns the matching lines in an array and display as needed.
> > 
> > Wow, that has got to be the most inefficient lazy method I've ever
> > heard. I would never suggest such a route on a production server. His
> > original plan is much more efficient and is generally along the lines
> > how how search indexing works. As such for a simple site I'd do what he
> > suggest using a FULLTEXT field in the database, or as Greg Donal
> > suggested, use soemthing like htdig. A more involved solution would be
> > something like Lucene. Either way, you don't want to be scanning the
> > files on ever search request.
> > 
> > Cheers,
> > Rob.
> 
-- 
.------------------------------------------------------------.
| InterJinn Application Framework - http://www.interjinn.com |
:------------------------------------------------------------:
| An application and templating framework for PHP. Boasting  |
| a powerful, scalable system for accessing system services  |
| such as forms, properties, sessions, and caches. InterJinn |
| also provides an extremely flexible architecture for       |
| creating re-usable components quickly and easily.          |
`------------------------------------------------------------'

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php