Re: FW: Accessing Files Outside the Web Root

David Robley <robleyd@xxxxxxxxxxx> · Thu, 14 Mar 2013 10:10:51 +1030

"Dale H. Cook" wrote:

> At 05:04 PM 3/13/2013, Dan McCullough wrote
> :
>>Web bots can ignore the robots.txt file, most scrapers would.
> 
> and at 05:06 PM 3/13/2013, Marc Guay wrote:
> 
>>These don't sound like robots that would respect a txt file to me.
> 
> Dan and Marc are correct. Although I used the terms "spiders" and
> "pirates" I believe that the correct term, as employed by Dan, is
> "scrapers," and that twerm might be applied to either the robot or the
> site which displays its results. One blogger has called scrapers "the
> arterial plaque of the Internet." I need to implement a solution that
> allows humans to access my files but prevents scrapers from accessing
> them. I will undoubtedly have to implement some type of
> challenge-and-response in the system (such as a captcha), but as long as
> those files are stored below the web root a scraper that has a valid URL
> can probably grab them. That is part of what the "public" in public_html
> implies.
> 
> One of the reasons why this irks me is that the scrapers are all
> commercial sites, but they haven't offered me a piece of the action for
> the use of my files. My domain is an entirely non-commercial domain, and I
> provide free hosting for other non-commercial genealogical works,
> primarily pages that are part of the USGenWeb Project, which is perhaps
> the largest of all non-commercial genealogical projects.
> 

readfile() is probably where you want to start, in conjunction with a 
captcha or similar

-- 
Cheers
David Robley

Catholic (n.) A cat with a drinking problem.

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php