Re: Crawlers (was parsing large files - PHP or Perl)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Is there anyone on this list who has written fast and decent crawlers in PHP who would be willing to share their experiences?

My first inclination would be to use an existing crawler to grab the pages and store all the files locally (even if only temporarily). Then, you can use PHP to do whatever type of processing you want on those files and can even have PHP crawl deeper based on links in those files if necessary. I'd have a hard time coming up with a reason to think I would implement a better web crawler on my own than is already available from other projects that focus on that. What about existing search systems like:

Nutch - http://www.nutch.org
mnoGoSearch - http://mnogosearch.org/
htdig - http://www.htdig.org/
or maybe even a "wget -r" - http://www.gnu.org/software/wget/wget.html
(I'm sure I missed a bunch of great options)

Just an idea - I'd also like to hear if someone has written nice crawling code in PHP.

- Jamie

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php


[Index of Archives]     [PHP Home]     [Apache Users]     [PHP on Windows]     [Kernel Newbies]     [PHP Install]     [PHP Classes]     [Pear]     [Postgresql]     [Postgresql PHP]     [PHP on Windows]     [PHP Database Programming]     [PHP SOAP]

  Powered by Linux