Please don't top post. On 25 November 2010 15:38, Ron Piggott <ron.piggott@xxxxxxxxxxxxxxxxxx> wrote: > > Is "User Agent" suppose to have a hyphen Â"-" Â? Â Ron > > > > The Verse of the Day > âEncouragement from Godâs Wordâ > http://www.TheVerseOfTheDay.info > -----Original Message----- From: Richard Quadling > Sent: Thursday, November 25, 2010 9:16 AM > To: Deva > Cc: Shreyas Agasthya ; Ron Piggott ; php-general@xxxxxxxxxxxxx ; > ash@xxxxxxxxxxxxxxxxxxxx > Subject: Re: Fw: Spoofing user_agent > > On 25 November 2010 11:32, Deva <devendra.in@xxxxxxxxx> wrote: >> >> Use curl >> http://php.net/manual/en/book.curl.php >> >> >> On Thu, Nov 25, 2010 at 4:41 PM, Shreyas Agasthya >> <shreyasbr@xxxxxxxxx>wrote: >> >>> I feel you should use more of the 4th method here as you are not trying >>> to >>> read the file but the header level Â(7th layer) information of the HTTP >>> protocol. >>> >>> http://php.net/manual/en/function.file-get-contents.php >>> >>> >>> --Shreyas >>> >>> On Thu, Nov 25, 2010 at 4:11 PM, Ron Piggott < >>> ron.piggott@xxxxxxxxxxxxxxxxxx >>> > wrote: >>> >>> > Â Will the header pass with using file_get_contents , or should I be >>> using >>> > another command, and if so, which one? ÂRon >>> > >>> > <?php >>> > >>> > Â Â header('User Agent: RonBot (http://www.example.com)'); >>> > Â Â $url = "http://www.example.com"; <http://www.example.com%22;> >>> > >>> > Â Â Â Â $input = file_get_contents($url); >>> > >>> > >>> > >>> > The Verse of the Day >>> > âEncouragement from Godâs Wordâ >>> > http://www.TheVerseOfTheDay.info >>> > >>> > Â*From:* Shreyas Agasthya <shreyasbr@xxxxxxxxx> >>> > *Sent:* Thursday, November 25, 2010 4:21 AM >>> > *To:* Ron Piggott <ron.piggott@xxxxxxxxxxxxxxxxxx> >>> > *Cc:* php-general@xxxxxxxxxxxxx ; ash@xxxxxxxxxxxxxxxxxxxx >>> > *Subject:* Re: Fw: Spoofing user_agent >>> > >>> > A standard HTTP Request headers is : User Agent (without the > >>> > underscore). >>> > >>> > --Shreyas >>> > >>> > On Thu, Nov 25, 2010 at 2:36 PM, Ron Piggott < >>> > ron.piggott@xxxxxxxxxxxxxxxxxx> wrote: >>> > >>> >> >>> >> Is this what you are telling me to do: >>> >> >>> >> header('user_agent: RonBot (http://www.theverseoftheday.info)'); >>> >> >>> >> Ron >>> >> >>> >> The Verse of the Day >>> >> âEncouragement from Godâs Wordâ >>> >> http://www.TheVerseOfTheDay.info >>> >> >>> >> From: ash@xxxxxxxxxxxxxxxxxxxx >>> >> Sent: Thursday, November 25, 2010 3:34 AM >>> >> To: Ron Piggott ; php-general@xxxxxxxxxxxxx >>> >> Subject: Re: Fw: Spoofing user_agent >>> >> >>> >> You need to set it in the header request you make. Putting it in the >>> >> script you're using as a spider with ini_set won't do anything because >>> the >>> >> Target site doesn't know anything about it. >>> >> >>> >> Thanks, >>> >> Ash >>> >> http://www.ashleysheridan.co.uk >>> >> >>> >> ----- Reply message ----- >>> >> From: "Ron Piggott" <ron.piggott@xxxxxxxxxxxxxxxxxx> >>> >> Date: Thu, Nov 25, 2010 08:25 >>> >> Subject: Fw: Spoofing user_agent >>> >> To: <php-general@xxxxxxxxxxxxx> >>> >> >>> >> I have wrote a script to generate a sitemap of my web site. ÂIt crawls >>> all >>> >> of the site web pages. Â(About 30,000) >>> >> >>> >> I need help to spoof the user_agent variable so the stats program >>> running >>> >> in the background ( âAWSTATSâ ) will treat the crawl as a bot, not >>> browsing >>> >> usage. >>> >> >>> >> The sitemap generator is a cron job. ÂI tried the syntax: >>> >> ini_set('user_agent', 'RonBot (http://www.theverseoftheday.info)/'/); >>> >> >>> >> This didnât work. ÂThe browsing was attributed to the dedicated IP >>> >> address. >>> >> >>> >> How do I get AWSTATS to access this, such as other entries under the >>> >> âRobots/Spiders visitorsâ heading: >>> >> Unknown robot (identified by 'bot*') >>> >> >>> >> I donât mean any ill will by changing this setting. ÂThanks for the >>> help. >>> >> >>> >> Ron >>> >> >>> >> The Verse of the Day >>> >> âEncouragement from Godâs Wordâ >>> >> http://www.TheVerseOfTheDay.info >>> >> >>> >> >>> > >>> > >>> > -- >>> > Regards, >>> > Shreyas Agasthya >>> > >>> >>> >>> >>> -- >>> Regards, >>> Shreyas Agasthya >>> >> >> >> >> -- >> :DJ >> > > It is no use using header(). This sets a header for the client, not > the server of any file_get_contents() requests. > > I use stream_contexts. > > $s_Contents = file_get_contents( > Â$s_URL, > ÂFalse, > Âstream_context_create( > Â array( > Â Â 'http' => array( > Â Â Â 'method' => 'GET', > Â Â Â 'header' => "User-Agent: RonBot (http://www.example.com)\r\n" > Â Â ), > Â ) > Â) > ); > > You can supply cookies, or anything else, with the request. Make sure > you add a \r\n to each of the headers and just concatenate them. > > If you are doing this in a loop, then I'd recommend creating a default > stream context and then the request would just be ... > > $s_Contents = file_get_contents($s_URL); > > As the default stream context would be applied. > > I had to use a default stream context to route all http requests > through an NTLM authentication proxy server because PHP doesn't deal > with NTLM authentication. > > See my user notes on > http://docs.php.net/manual/en/function.stream-context-get-default.php. > Don't bother with the link at the bottom of the user note- it's not > live. > > Richard. > > -- > Richard Quadling > Twitter : EE : Zend > @RQuadling : e-e.com/M_248814.html : bit.ly/9O8vFY > http://en.wikipedia.org/wiki/User_agent "... the identity is transmitted via the User-Agent request header, ... " -- Richard Quadling Twitter : EE : Zend @RQuadling : e-e.com/M_248814.html : bit.ly/9O8vFY -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php