Re: New search related question

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Feb 4, 2008 7:42 PM, Jim Lucas <lists@xxxxxxxxx> wrote:
> Daniel Brown wrote:
> > On Feb 4, 2008 2:48 PM, Jason Pruim <japruim@xxxxxxxxxx> wrote:
> >> Hi Everyone! :)
> >>
> >> Just a quick question, I've done some googling but haven't been able
> >> to find what I need... I am looking at doing a search function for
> >> someone's website, the website is just static HTML files, and she
> >> doesn't want to redo the entire website to make it dynamic.
> >
> >     I got bored, so I wrote out a system to handle it.  Let me know if
> > you want the source when it's done.
> >
>
> So did I... :)
>
> Has options for searching recursively, case sensitivity, and displayable HTML
> only or the entire file.  Also a restriction to limit which filetypes from the
> results it will display.
>
> It only works on *nix, I used grep.  Don't have to worry about cron, scheduled
> tasks, etc to refresh a DB.  I have it searching about 7000 files that range
> from plain text 2k all the way up to 60meg binary zip files, and it is rather
> quick at it.

    I was going to do the same thing, but I started the framework from
a different perspective: portability across platforms to be used to
spider local and remote sites.  I can punch in any valid URL and
recursively scan all public links, remove all HTML, put plain text
into the database, and (when it's actually working a bit better) I'm
going to use a Google-style response on search.  For example, when
terms are found, show snippets of the terms with four or five words on
each side of the terms.

    There are still some things I'd like to add in, should it be of
use to someone, including title searching, META searching, and blah,
blah, blah, but for now I'm just putting together the boring
foundation on which someone else can build the nice-looking house.

-- 
</Dan>

Daniel P. Brown
Senior Unix Geek
<? while(1) { $me = $mind--; sleep(86400); } ?>

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php


[Index of Archives]     [PHP Home]     [Apache Users]     [PHP on Windows]     [Kernel Newbies]     [PHP Install]     [PHP Classes]     [Pear]     [Postgresql]     [Postgresql PHP]     [PHP on Windows]     [PHP Database Programming]     [PHP SOAP]

  Powered by Linux