I feel you should use more of the 4th method here as you are not trying to read the file but the header level (7th layer) information of the HTTP protocol. http://php.net/manual/en/function.file-get-contents.php --Shreyas On Thu, Nov 25, 2010 at 4:11 PM, Ron Piggott <ron.piggott@xxxxxxxxxxxxxxxxxx > wrote: > Will the header pass with using file_get_contents , or should I be using > another command, and if so, which one? Ron > > <?php > > header('User Agent: RonBot (http://www.example.com)'); > $url = "http://www.example.com"; <http://www.example.com%22;> > > $input = file_get_contents($url); > > > > The Verse of the Day > “Encouragement from God’s Word” > http://www.TheVerseOfTheDay.info > > *From:* Shreyas Agasthya <shreyasbr@xxxxxxxxx> > *Sent:* Thursday, November 25, 2010 4:21 AM > *To:* Ron Piggott <ron.piggott@xxxxxxxxxxxxxxxxxx> > *Cc:* php-general@xxxxxxxxxxxxx ; ash@xxxxxxxxxxxxxxxxxxxx > *Subject:* Re: Fw: Spoofing user_agent > > A standard HTTP Request headers is : User Agent (without the underscore). > > --Shreyas > > On Thu, Nov 25, 2010 at 2:36 PM, Ron Piggott < > ron.piggott@xxxxxxxxxxxxxxxxxx> wrote: > >> >> Is this what you are telling me to do: >> >> header('user_agent: RonBot (http://www.theverseoftheday.info)'); >> >> Ron >> >> The Verse of the Day >> “Encouragement from God’s Word” >> http://www.TheVerseOfTheDay.info >> >> From: ash@xxxxxxxxxxxxxxxxxxxx >> Sent: Thursday, November 25, 2010 3:34 AM >> To: Ron Piggott ; php-general@xxxxxxxxxxxxx >> Subject: Re: Fw: Spoofing user_agent >> >> You need to set it in the header request you make. Putting it in the >> script you're using as a spider with ini_set won't do anything because the >> Target site doesn't know anything about it. >> >> Thanks, >> Ash >> http://www.ashleysheridan.co.uk >> >> ----- Reply message ----- >> From: "Ron Piggott" <ron.piggott@xxxxxxxxxxxxxxxxxx> >> Date: Thu, Nov 25, 2010 08:25 >> Subject: Fw: Spoofing user_agent >> To: <php-general@xxxxxxxxxxxxx> >> >> I have wrote a script to generate a sitemap of my web site. It crawls all >> of the site web pages. (About 30,000) >> >> I need help to spoof the user_agent variable so the stats program running >> in the background ( “AWSTATS” ) will treat the crawl as a bot, not browsing >> usage. >> >> The sitemap generator is a cron job. I tried the syntax: >> ini_set('user_agent', 'RonBot (http://www.theverseoftheday.info)/'/); >> >> This didn’t work. The browsing was attributed to the dedicated IP >> address. >> >> How do I get AWSTATS to access this, such as other entries under the >> “Robots/Spiders visitors” heading: >> Unknown robot (identified by 'bot*') >> >> I don’t mean any ill will by changing this setting. Thanks for the help. >> >> Ron >> >> The Verse of the Day >> “Encouragement from God’s Word” >> http://www.TheVerseOfTheDay.info >> >> > > > -- > Regards, > Shreyas Agasthya > -- Regards, Shreyas Agasthya