"Dale H. Cook" wrote: > At 05:04 PM 3/13/2013, Dan McCullough wrote > : >>Web bots can ignore the robots.txt file, most scrapers would. > > and at 05:06 PM 3/13/2013, Marc Guay wrote: > >>These don't sound like robots that would respect a txt file to me. > > Dan and Marc are correct. Although I used the terms "spiders" and > "pirates" I believe that the correct term, as employed by Dan, is > "scrapers," and that twerm might be applied to either the robot or the > site which displays its results. One blogger has called scrapers "the > arterial plaque of the Internet." I need to implement a solution that > allows humans to access my files but prevents scrapers from accessing > them. I will undoubtedly have to implement some type of > challenge-and-response in the system (such as a captcha), but as long as > those files are stored below the web root a scraper that has a valid URL > can probably grab them. That is part of what the "public" in public_html > implies. > > One of the reasons why this irks me is that the scrapers are all > commercial sites, but they haven't offered me a piece of the action for > the use of my files. My domain is an entirely non-commercial domain, and I > provide free hosting for other non-commercial genealogical works, > primarily pages that are part of the USGenWeb Project, which is perhaps > the largest of all non-commercial genealogical projects. > readfile() is probably where you want to start, in conjunction with a captcha or similar -- Cheers David Robley Catholic (n.) A cat with a drinking problem. -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php