Re: Fw: Spoofing user_agent

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 25 November 2010 11:32, Deva <devendra.in@xxxxxxxxx> wrote:
> Use curl
> http://php.net/manual/en/book.curl.php
>
>
> On Thu, Nov 25, 2010 at 4:41 PM, Shreyas Agasthya <shreyasbr@xxxxxxxxx>wrote:
>
>> I feel you should use more of the 4th method here as you are not trying to
>> read the file but the header level Â(7th layer) information of the HTTP
>> protocol.
>>
>> http://php.net/manual/en/function.file-get-contents.php
>>
>>
>> --Shreyas
>>
>> On Thu, Nov 25, 2010 at 4:11 PM, Ron Piggott <
>> ron.piggott@xxxxxxxxxxxxxxxxxx
>> > wrote:
>>
>> > Â Will the header pass with using file_get_contents , or should I be
>> using
>> > another command, and if so, which one? ÂRon
>> >
>> > <?php
>> >
>> > Â Â header('User Agent: RonBot (http://www.example.com)');
>> > Â Â $url = "http://www.example.com";; <http://www.example.com%22;>
>> >
>> > Â Â Â Â $input = file_get_contents($url);
>> >
>> >
>> >
>> > The Verse of the Day
>> > âEncouragement from Godâs Wordâ
>> > http://www.TheVerseOfTheDay.info
>> >
>> > Â*From:* Shreyas Agasthya <shreyasbr@xxxxxxxxx>
>> > *Sent:* Thursday, November 25, 2010 4:21 AM
>> > *To:* Ron Piggott <ron.piggott@xxxxxxxxxxxxxxxxxx>
>> > *Cc:* php-general@xxxxxxxxxxxxx ; ash@xxxxxxxxxxxxxxxxxxxx
>> > *Subject:* Re:  Fw: Spoofing user_agent
>> >
>> > A standard HTTP Request headers is : User Agent (without the underscore).
>> >
>> > --Shreyas
>> >
>> > On Thu, Nov 25, 2010 at 2:36 PM, Ron Piggott <
>> > ron.piggott@xxxxxxxxxxxxxxxxxx> wrote:
>> >
>> >>
>> >> Is this what you are telling me to do:
>> >>
>> >> header('user_agent: RonBot (http://www.theverseoftheday.info)');
>> >>
>> >> Ron
>> >>
>> >> The Verse of the Day
>> >> âEncouragement from Godâs Wordâ
>> >> http://www.TheVerseOfTheDay.info
>> >>
>> >> From: ash@xxxxxxxxxxxxxxxxxxxx
>> >> Sent: Thursday, November 25, 2010 3:34 AM
>> >> To: Ron Piggott ; php-general@xxxxxxxxxxxxx
>> >> Subject: Re:  Fw: Spoofing user_agent
>> >>
>> >> You need to set it in the header request you make. Putting it in the
>> >> script you're using as a spider with ini_set won't do anything because
>> the
>> >> Target site doesn't know anything about it.
>> >>
>> >> Thanks,
>> >> Ash
>> >> http://www.ashleysheridan.co.uk
>> >>
>> >> ----- Reply message -----
>> >> From: "Ron Piggott" <ron.piggott@xxxxxxxxxxxxxxxxxx>
>> >> Date: Thu, Nov 25, 2010 08:25
>> >> Subject:  Fw: Spoofing user_agent
>> >> To: <php-general@xxxxxxxxxxxxx>
>> >>
>> >> I have wrote a script to generate a sitemap of my web site. ÂIt crawls
>> all
>> >> of the site web pages. Â(About 30,000)
>> >>
>> >> I need help to spoof the user_agent variable so the stats program
>> running
>> >> in the background ( âAWSTATSâ ) will treat the crawl as a bot, not
>> browsing
>> >> usage.
>> >>
>> >> The sitemap generator is a cron job. ÂI tried the syntax:
>> >> ini_set('user_agent', 'RonBot (http://www.theverseoftheday.info)/'/);
>> >>
>> >> This didnât work. ÂThe browsing was attributed to the dedicated IP
>> >> address.
>> >>
>> >> How do I get AWSTATS to access this, such as other entries under the
>> >> âRobots/Spiders visitorsâ heading:
>> >> Unknown robot (identified by 'bot*')
>> >>
>> >> I donât mean any ill will by changing this setting. ÂThanks for the
>> help.
>> >>
>> >> Ron
>> >>
>> >> The Verse of the Day
>> >> âEncouragement from Godâs Wordâ
>> >> http://www.TheVerseOfTheDay.info
>> >>
>> >>
>> >
>> >
>> > --
>> > Regards,
>> > Shreyas Agasthya
>> >
>>
>>
>>
>> --
>> Regards,
>> Shreyas Agasthya
>>
>
>
>
> --
> :DJ
>

It is no use using header(). This sets a header for the client, not
the server of any file_get_contents() requests.

I use stream_contexts.

$s_Contents = file_get_contents(
  $s_URL,
  False,
  stream_context_create(
    array(
      'http' => array(
        'method' => 'GET',
        'header' => "User-Agent: RonBot (http://www.example.com)\r\n"
      ),
    )
  )
);

You can supply cookies, or anything else, with the request. Make sure
you add a \r\n to each of the headers and just concatenate them.

If you are doing this in a loop, then I'd recommend creating a default
stream context and then the request would just be ...

$s_Contents = file_get_contents($s_URL);

As the default stream context would be applied.

I had to use a default stream context to route all http requests
through an NTLM authentication proxy server because PHP doesn't deal
with NTLM authentication.

See my user notes on
http://docs.php.net/manual/en/function.stream-context-get-default.php.
Don't bother with the link at the bottom of the user note- it's not
live.

Richard.

-- 
Richard Quadling
Twitter : EE : Zend
@RQuadling : e-e.com/M_248814.html : bit.ly/9O8vFY

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php




[Index of Archives]     [PHP Home]     [Apache Users]     [PHP on Windows]     [Kernel Newbies]     [PHP Install]     [PHP Classes]     [Pear]     [Postgresql]     [Postgresql PHP]     [PHP on Windows]     [PHP Database Programming]     [PHP SOAP]

  Powered by Linux