Re: is_readable(http://.... text file) says not, but I can in browser

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sun, Jan 11, 2009 at 13:42, Nathan Rixham <nrixham@xxxxxxxxx> wrote:
>
> pedantic: really you'd want to check the http status header returned and
> take the appropriate action depending on what header was returned.
>
> I'm sure there's a way to get the headers from a file_get_contents using one
> of the stream_get_* functions but can't remember off hand, going to have to
> look into that at some point. (stream_get_metadata maybe?)

    You're right, Nate.  I should've included that as well rather than
suggesting that parsing a page is the only way.  It wasn't how I
intended it, but by neglecting to include the info, I can see how it
would be thought that way.  So here it is, updated:

<?php
   $myFileLast = "http://www.isawit.com/index.php";;
   $theDataLast = (strlen($tmpData = @file_get_contents($myFileLast))
> 0 && !strstr('404',$http_response_header[1])) ? $tmpData :
sprintf("File not found.");
?>

> dan raises a good point though; there are loads of responses you can get,
> and not all of them will be the page you expect; worth giving some thought
> to.

    There's also a strong possibility that the data will be different
- or 404'ed, even if the data exists on the server, as a means of
dropping automated requests, which could be seen as data scraping.  So
if you go the cURL route, you may want to consider client (browser)
spoofing, using MSIE, Firefox, Safari, or even Googlebot.

    I had written a paper several years ago for a security consortium,
and one of the things I had included in the website security section
was how some webmasters were adopting a practice of allowing Google
and other bots to spider private content in order to build searchable
terms and increase their results.  It would appear to Google, Yahoo!,
etc., that the page was a normal, unrestricted web page based only
upon the headers sent by the bot, while an unmodified browser would
send its headers as being a standard computer web browser.  By simply
changing the sent headers on a browser, I was able to surf the
websites without restriction - some even allowing me unchallenged into
administrative areas that contained personal information.

-- 
</Daniel P. Brown>
daniel.brown@xxxxxxxxxxxx || danbrown@xxxxxxx
http://www.parasane.net/ || http://www.pilotpig.net/
Unadvertised dedicated server deals, too low to print - email me to find out!

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php


[Index of Archives]     [PHP Home]     [Apache Users]     [PHP on Windows]     [Kernel Newbies]     [PHP Install]     [PHP Classes]     [Pear]     [Postgresql]     [Postgresql PHP]     [PHP on Windows]     [PHP Database Programming]     [PHP SOAP]

  Powered by Linux