On Tue, 2010-11-30 at 00:19 -0200, Thiago H. Pojda wrote: > Quit top posting. > > On Mon, Nov 29, 2010 at 9:55 PM, Ron Piggott <ron.piggott@xxxxxxxxxxxxxxxxxx > > wrote: > > > > > My issue with the user agent is unresolved. I need to do more research to > > see how AWSTATS distinguishes between a robot crawling the site and a web > > page user and set the user-agent accordingly. > > > > Ron, > > AWSTATS probably users a knowledge base for known bots, I'm not sure. If > that's the case, you can just set your User-Agent to a known and see how > that goes. > > Look for Googlebot, Majestic, Ask.com (now dead - probably a good pick), > MSNBot here: http://www.user-agents.org/ > > As for setting the User-Agent in your request, I like to use this cUrl > snippet (based on a note at curl's manual page): > > <?php > > $sUrl = 'www.example.com/'; > $sUserAgent = 'Googlebot/2.1 (+http://www.googlebot.com/bot.html)'; > > $hCurl = curl_init(); > curl_setopt ($hCurl, CURLOPT_RETURNTRANSFER, TRUE); > curl_setopt ($hCurl, CURLOPT_URL, $sUrl); > curl_setopt ($hCurl, CURLOPT_CONNECTTIMEOUT, 120); > curl_setopt ($hCurl, CURLOPT_TIMEOUT, 120); > curl_setopt ($hCurl, CURLOPT_USERAGENT, $sUserAgent); > > $sContent = curl_exec($hCurl); > ?> > > > Cheers, > Thiago Henrique Pojda > +55 41 8856-7925 There's a very easy way to read in a user agent and determine if it is a bot or not. Google for browscap.ini. This is basically a massive ini file containing details of known user agent header strings and some basic information about them, including whether it is a bot or not. There are various functions in PHP for parsing this, but I'm still not exactly sure what you want to do, as so far you've been both asking how other scripts check for bots and how to change your own user agent string, both of which are quite different. Thanks, Ash http://www.ashleysheridan.co.uk