Re: wget

Andrew Bacchi <bacchi@xxxxxxx> · Mon, 30 Jun 2008 08:45:38 -0400

I've already sent you a link that provides explanation and examples.  I 
don't mind pointing someone in the right direction, but I won't sit here 
and solve all your problems for you.  Try searching google.

Joy Methew wrote:
Bacchi

how i use "robost.txt" plz explain with example.

Daniel......

it`s working for "wget" but still we can download from other utilities
like..."DownloadStudio"

On 6/27/08, Daniel Carrillo <daniel.carrillo@xxxxxxxxx> wrote:

2008/6/27 Joy Methew <ml4joy@xxxxxxxxx>:

hiii all....

we can download any site from "wget -r " options.
if i want to stop downloading of my site from web server how i can do
this???

You can configure Apache for refuse connections with UserAgent "wget",
but note that wget can use any UserAgent (--user-agent option).

SetEnvIfNoCase User-Agent "^wget" blacklist
<Location />
  ...
  your options
  ...
  Order allow,deny
  Allow from all
  Deny from env=blacklist
</Location>

BTW: robots.txt only can stop crawling from "good" crawlers, like
google, yahoo, alexa, etc.

--
redhat-list mailing list
unsubscribe mailto:redhat-list-request@xxxxxxxxxx?subject=unsubscribe
https://www.redhat.com/mailman/listinfo/redhat-list

--
veritatas simplex oratio est
       -Seneca

Andrew Bacchi
Systems Programmer
Rensselaer Polytechnic Institute
phone: 518.276.6415  fax: 518.276.2809

http://www.rpi.edu/~bacchi/

--
redhat-list mailing list
unsubscribe mailto:redhat-list-request@xxxxxxxxxx?subject=unsubscribe
https://www.redhat.com/mailman/listinfo/redhat-list