Re: Search engine

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



It's not that complicated, after crawling your website at a specific time you can check for content created in your database after that time. Then you can search specific fields from that content for words and index them. If you need to find <a href=""> tags you'ld best use regular expressions to search through the text and then follow those links.

You can keep an index of links crawled and date they were crawled to ensure you don't crawl pages that haven't changed.

I wrote a crawling engine for bittorrent site so if you need some sample code let me know.


Ron Piggott (PHP) wrote:
What I haven't figured out is how to crawl my site.  If I knew how to
crawl my site I could write a basic search engine and develop it over
time. This is the crucial issue I need help with.
I have a really basic search engine I wrote last year.  I told my web
site what pages the user could search ... as in I keyed in the URL's
I am giving my site a face lift and making some serious changes to
it ... I have my database out of date ... I would like to crawl my site
once a day to update what changes I made during the past 24 hours and
delete HTML files which no longer exist.
If someone knew how to retrieve various <a href=""> then I could start
with something and build on it over time ...
Ron

--
PHP Database Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php


[Index of Archives]     [PHP Home]     [PHP Users]     [Postgresql Discussion]     [Kernel Newbies]     [Postgresql]     [Yosemite News]

  Powered by Linux