Hi, Thanks for your answers. At the moment, we have 4 "monster"-servers, no indication of any performance issues. (there is an extensive munin monitoring) TCP-states: http://prntscr.com/19qle2 CPU: http://prntscr.com/19qltm Load: http://prntscr.com/19qlwe Vmstat: http://prntscr.com/19qm3v Bandwidth: http://prntscr.com/19qmc4 We have 4 squid instances per server and 4 servers, handling all together approx 2000rps without harddisc-caching. Half of them is doing kerberos authentication and the other half is doing LDAP authentication. Content scanning is done by a couple (6 at the moment) of webwasher appliances. These are my cache settings per instance: # cache specific settings cache_replacement_policy heap LFUDA cache_mem 1600 MB memory_replacement_policy heap LFUDA maximum_object_size_in_memory 2048 KB memory_pools off cache_swap_low 85 cache_swap_high 90 My plan is to adjust a couple of icap timers and increase icap debugging to 93,4 or 93,5) I found these messages: 2013/06/13 03:49:42| essential ICAP service is down after an options fetch failure: icap://10.122.125.48:1344/wwreqmod [down,!opt] 2013/06/13 11:09:33.530| essential ICAP service is suspended: icap://10.122.125.48:1344/wwreqmod [down,susp,fail11] What does down,!opt or down,susp,fail11 mean? thanks! Peter On Thu, Jun 13, 2013 at 2:41 AM, Eliezer Croitoru <eliezer@xxxxxxxxxxxx> wrote: > Hey, > > There was a bug that is related to LOAD on a server. > your server is a monster!! > squid 3.1.12 cannot even use the ammount of CPU you have on this machine as > far as I can tell from my knowledge unless you have couple clever ideas in > your sleeve.(routing marking etc..) > > To make sure what the problem is I would recommend also to verify the load > on the server in a manner of open and half open sessions\connections to > squid and icap service\server. > Are you using this squid server for filtering only? or also cache? > if so what is the cache size? > > The above questions can help us determine your situation and try to help you > verify that the culprit is a specific bug that from my testings on 3.3.5 > doesn't exists anymore. > if you are up for the task to verify the loads on the server I can tell you > it's a 90% go on the bug. > What I had was a problem when squid was going over the 900 RPS the ICAP > service would go into a mode which stopped responding to requests.(and > showed the mentioned screen) > This bug was tested on a very slow machine compared to yours. > On a monster like yours this effect that I have tested might not appear with > the same side effects like "denial of service" but rather "interruption of > service" which your monster recover very quickly from. > > I'm here if you need any assistance, > Eliezer > > > On 6/12/2013 4:57 PM, guest01 wrote: >> >> Hi guys, >> >> We are currently using Squid 3.1.12 (old, I know) on RHEL 5.8 64bit >> (HP ProLiant DL380 G7 with 16 CPU and 28GB RAM) >> Squid Cache: Version 3.1.12 >> configure options: '--enable-ssl' '--enable-icap-client' >> '--sysconfdir=/etc/squid' '--enable-async-io' '--enable-snmp' >> '--enable-poll' '--with-maxfd=32768' '--enable-storeio=aufs' >> '--enable-removal-policies=heap,lru' '--enable-epoll' >> '--disable-ident-lookups' '--enable-truncate' >> '--with-logdir=/var/log/squid' '--with-pidfile=/var/run/squid.pid' >> '--with-default-user=squid' '--prefix=/opt/squid' '--enable-auth=basic >> digest ntlm negotiate' >> '-enable-negotiate-auth-helpers=squid_kerb_auth' >> --with-squid=/home/squid/squid-3.1.12 --enable-ltdl-convenience >> >> As ICAP server, we are using McAfee Webwasher 6.9 (old too, I know). >> Up until recently we hardly had problems with this environment. >> Squid is doing authentication via Kerberos and passing the username to >> the Webwasher, which is doing a LDAP lookup to find the users groups >> and assign a policy based on group membership. >> We have multiple Squids and multiple Webwasher with a hardware >> loadbalancer, approx 15k users. >> >> Since a couple of weeks, we almost daily get an ICAP server error >> message, similar to: >> http://support.kaspersky.com/2723 >> Unfortunately, I cannot figure out why. In blame the webwasher, but I >> am not 100% sure. >> >> This is my ICAP configuration: >> #ICAP >> icap_enable on >> icap_send_client_ip on >> icap_send_client_username on >> icap_preview_enable on >> icap_preview_size 30 >> icap_uses_indirect_client off >> icap_persistent_connections on >> icap_client_username_encode on >> icap_client_username_header X-Authenticated-User >> icap_service service_req reqmod_precache bypass=0 >> icap://10.122.125.48:1344/wwreqmod >> adaptation_access service_req deny favicon >> adaptation_access service_req deny to_localhost >> adaptation_access service_req deny from_localnet >> adaptation_access service_req deny whitelist >> adaptation_access service_req deny dst_whitelist >> adaptation_access service_req deny icap_bypass_src >> adaptation_access service_req deny icap_bypass_dst >> adaptation_access service_req allow all >> icap_service service_resp respmod_precache bypass=0 >> icap://10.122.125.48:1344/wwrespmod >> adaptation_access service_resp deny favicon >> adaptation_access service_resp deny to_localhost >> adaptation_access service_resp deny from_localnet >> adaptation_access service_resp deny whitelist >> adaptation_access service_resp deny dst_whitelist >> adaptation_access service_resp deny icap_bypass_src >> adaptation_access service_resp deny icap_bypass_dst >> adaptation_access service_resp allow all >> >> Could an upgrade (either to 3.2 or to 3.3) solve this problem (There >> are more icap options in recent squid versions available)? >> Unfortunately, this is a rather complex organisational process, that's >> why I did not do that yet. >> I do have a test machine, but this ICAP error is not reproducible, >> only in production. Server load and IO-througput are ok, there is >> nothing suspicious on the server. I recently activated icap debug >> option 93 and found following message: >> 2013/06/12 15:32:15| suspending ICAP service for too many failures >> 2013/06/12 15:32:15| essential ICAP service is suspended: >> icap://10.122.125.48:1344/wwrespmod [down,susp,fail11] >> 2013/06/12 15:35:15| essential ICAP service is up: >> icap://10.122.125.48:1344/wwreqmod [up] >> 2013/06/12 15:35:15| essential ICAP service is up: >> icap://10.122.125.48:1344/wwrespmod [up] >> I don't know why this check failed, but it usually does not occur when >> clients are getting the icap protocol error page. >> >> Another possibility would be the ICAP bypass, but our ICAP server is >> doing anti-Malware-checking and that's why I don't want to activate >> this feature. >> >> Does anybody have other ideas? >> >> Thanks! >> Peter >> >