User-agent/search engine spider class

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi All,

I'm looking for a class to help me log data about user agents and search
engine spiders. I came across PHPClientSniffer
(http://www.phpclasses.org/browse.html/package/81.html), which looks good
for things like detecting Javascript support. I'll probably incorporate some
of this, but my primary concern at the moment is detecting wheter a user
agent is probably a spider. 

There are dozens of known spiders. Ideally the class works against a csv or,
even better, an XML file of known spiders that I can update as new ones
become known. In a perfect world, someone (me?) hosts this XML file on a
public web server so users of this class can refresh it once a week or so.

On a related note, I'm looking for a class that parses the referer string
for common search engines. For example, Google referer strings usually look
something like this (in English):

http://www.google.com/search?hl=en&ie=UTF-8&oe=UTF-8&q=some+search+term

It would be awesome if there were a class that allowed me to do something
like this:

$referer=&new Referer($_SERVER['HTTP_REFERER']);

If($referer->is_search_engine()) {
	//return array of search words
	$searchPhrase=($referer->get_searchTerms());

	//return language of search engine
	$lang=$refer->get_language();
}

And on the spider side:

$spider=&new Spider($_SERVER['HTTP_USER_AGENT']);
If($spider->is_known_spider()) {
	//return common name of spider (e.g. "Google" or "Yahoo")
	$name=$spider->get_commonName();
	//return whether this spider is known to be a spambot
	$evil=$spider->is_spambot();

}

I'd write this myself, but the need isn't critical at this point. A "nice to
have" instead of a "nead to have."

Thanks!

-Derek


------------------------ Yahoo! Groups Sponsor ---------------------~-->
Buy Ink Cartridges or Refill Kits for your HP, Epson, Canon or Lexmark
Printer at MyInks.com. Free s/h on orders $50 or more to the US & Canada.
http://www.c1tracking.com/l.asp?cid=5511
http://us.click.yahoo.com/mOAaAA/3exGAA/qnsNAA/saFolB/TM
---------------------------------------------------------------------~->

Look here for Free PHP Classes of objects:
http://phpclasses.UpperDesign.com/
To unsubscribe from this group, send an email to:
php-objects-unsubscribe@egroups.com

 

Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/ 



[Index of Archives]     [PHP Home]     [PHP Users]     [PHP Soap]     [Kernel Newbies]     [Yosemite]     [Yosemite Campsites]

  Powered by Linux