On Tue, Oct 26, 2010 at 10:15 PM, Amos Jeffries <squid3@xxxxxxxxxxxxx> wrote: > On Tue, 26 Oct 2010 16:34:52 -0400, alexus <alexus@xxxxxxxxx> wrote: >> On Mon, Oct 25, 2010 at 6:38 PM, Amos Jeffries <squid3@xxxxxxxxxxxxx> >> wrote: >>> On Mon, 25 Oct 2010 12:38:49 -0400, alexus <alexus@xxxxxxxxx> wrote: >>>> is there a way to disallow serving of pages based on browser (agent)? >>>> I'm getting a lot of these: >>>> >>>> XX.XX.XX.XX - - [25/Oct/2010:16:37:44 +0000] "GET >>>> http://www.google.com/gwt/x? HTTP/1.1" 200 2232 "-" >>>> "SAMSUNG-SGH-E250/1.0 Profile/MIDP-2.0 Configuration/CLDC-1.1 >>>> UP.Browser/6.2.3.3.c.1.101 (GUI) MMP/2.0 (compatible; >>>> Googlebot-Mobile/2.1; +http://www.google.com/bot.html)" >>>> TCP_MISS:DIRECT >>> >>> Of course. you may... >>> >>> http://www.squid-cache.org/Doc/config/cache >>> >>> Although you need to be aware that preventing one object caching > operates >>> by removing records to it after the transaction has finished. The > effect >>> of >>> doing this which you can expect is that a visit by GoogleBot will empty >>> your cache of most content. >>> >>> Amos >>> >> >> I'm not sure what do you mean by that, it seems like I dont know how >> but my SQUID gets hit by different bots and I was thinking to somehow >> disallow access to them, so they dont hit me as hard... maybe it's a >> stupid way of dealing with things... > > Ah. Not caching will make the impact worse. One of the things Squid offers > is reduced web server impact from visitors. Squid is front-line software. > > Â* Start with creating a robots.txt. The major bots will obey that and you > can restrict where they go and sometimes how often. > > Â* allowing caching of dynamic pages where possible with squid-2.6 and > later (http://wiki.squid-cache.org/ConfigExamples/DynamicContent). Squid > will handle the bots and normal visitors faster if it has cached content to > serve out immediately instead of waiting. > > Â* check your squid.conf for performance killers (regex, external > helpers), reduce the number of requests reaching those ACL tests as much as > possible. Squid routinely handles thousands of concurrent connections for > ISP so a visit by several bots at once should not really be any visible > load. > > > Amos > I'm a little confused... what is robots.txt has to do with squid? where exactly should I place this robots.txt ? -- http://alexus.org/