On Tue, 26 Oct 2010 16:34:52 -0400, alexus <alexus@xxxxxxxxx> wrote: > On Mon, Oct 25, 2010 at 6:38 PM, Amos Jeffries <squid3@xxxxxxxxxxxxx> > wrote: >> On Mon, 25 Oct 2010 12:38:49 -0400, alexus <alexus@xxxxxxxxx> wrote: >>> is there a way to disallow serving of pages based on browser (agent)? >>> I'm getting a lot of these: >>> >>> XX.XX.XX.XX - - [25/Oct/2010:16:37:44 +0000] "GET >>> http://www.google.com/gwt/x? HTTP/1.1" 200 2232 "-" >>> "SAMSUNG-SGH-E250/1.0 Profile/MIDP-2.0 Configuration/CLDC-1.1 >>> UP.Browser/6.2.3.3.c.1.101 (GUI) MMP/2.0 (compatible; >>> Googlebot-Mobile/2.1; +http://www.google.com/bot.html)" >>> TCP_MISS:DIRECT >> >> Of course. you may... >> >> http://www.squid-cache.org/Doc/config/cache >> >> Although you need to be aware that preventing one object caching operates >> by removing records to it after the transaction has finished. The effect >> of >> doing this which you can expect is that a visit by GoogleBot will empty >> your cache of most content. >> >> Amos >> > > I'm not sure what do you mean by that, it seems like I dont know how > but my SQUID gets hit by different bots and I was thinking to somehow > disallow access to them, so they dont hit me as hard... maybe it's a > stupid way of dealing with things... Ah. Not caching will make the impact worse. One of the things Squid offers is reduced web server impact from visitors. Squid is front-line software. * Start with creating a robots.txt. The major bots will obey that and you can restrict where they go and sometimes how often. * allowing caching of dynamic pages where possible with squid-2.6 and later (http://wiki.squid-cache.org/ConfigExamples/DynamicContent). Squid will handle the bots and normal visitors faster if it has cached content to serve out immediately instead of waiting. * check your squid.conf for performance killers (regex, external helpers), reduce the number of requests reaching those ACL tests as much as possible. Squid routinely handles thousands of concurrent connections for ISP so a visit by several bots at once should not really be any visible load. Amos