RE: Re: Crawlers (was parsing large files - PHP or Perl)

"Warren Vail" <warren@xxxxxxxxxxxx> · Thu, 17 Feb 2005 09:43:29 -0800



Check out PHPdig;

http://www.phpdig.net/

Warren

> -----Original Message-----
> From: Jamie Alessio [mailto:Jamie.Alessio@xxxxxxxx]
> Sent: Thursday, February 17, 2005 9:22 AM
> To: John Cage
> Cc: php-general@xxxxxxxxxxxxx
> Subject:  Re: Crawlers (was parsing large files - PHP or Perl)
> 
> 
> > Is there anyone on this list who has written fast and decent 
> > crawlers in PHP who would be willing to share their experiences?
> > 
> My first inclination would be to use an existing crawler to grab the 
> pages and store all the files locally (even if only temporarily). Then, 
> you can use PHP to do whatever type of processing you want on those 
> files and can even have PHP crawl deeper based on links in those files 
> if necessary. I'd have a hard time coming up with a reason to think I 
> would implement a better web crawler on my own than is already available 
> from other projects that focus on that. What about existing search 
> systems like:
> 
> Nutch - http://www.nutch.org
> mnoGoSearch - http://mnogosearch.org/
> htdig - http://www.htdig.org/
> or maybe even a "wget -r" - http://www.gnu.org/software/wget/wget.html
> (I'm sure I missed a bunch of great options)
> 
> Just an idea - I'd also like to hear if someone has written nice 
> crawling code in PHP.
> 
> - Jamie
> 
> -- 
> PHP General Mailing List (http://www.php.net/)
> To unsubscribe, visit: http://www.php.net/unsub.php
> 
> 
> 

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php