Hey,
There was a bug that is related to LOAD on a server.
your server is a monster!!
squid 3.1.12 cannot even use the ammount of CPU you have on this machine
as far as I can tell from my knowledge unless you have couple clever
ideas in your sleeve.(routing marking etc..)
To make sure what the problem is I would recommend also to verify the
load on the server in a manner of open and half open
sessions\connections to squid and icap service\server.
Are you using this squid server for filtering only? or also cache?
if so what is the cache size?
The above questions can help us determine your situation and try to help
you verify that the culprit is a specific bug that from my testings on
3.3.5 doesn't exists anymore.
if you are up for the task to verify the loads on the server I can tell
you it's a 90% go on the bug.
What I had was a problem when squid was going over the 900 RPS the ICAP
service would go into a mode which stopped responding to requests.(and
showed the mentioned screen)
This bug was tested on a very slow machine compared to yours.
On a monster like yours this effect that I have tested might not appear
with the same side effects like "denial of service" but rather
"interruption of service" which your monster recover very quickly from.
I'm here if you need any assistance,
Eliezer
On 6/12/2013 4:57 PM, guest01 wrote:
Hi guys,
We are currently using Squid 3.1.12 (old, I know) on RHEL 5.8 64bit
(HP ProLiant DL380 G7 with 16 CPU and 28GB RAM)
Squid Cache: Version 3.1.12
configure options: '--enable-ssl' '--enable-icap-client'
'--sysconfdir=/etc/squid' '--enable-async-io' '--enable-snmp'
'--enable-poll' '--with-maxfd=32768' '--enable-storeio=aufs'
'--enable-removal-policies=heap,lru' '--enable-epoll'
'--disable-ident-lookups' '--enable-truncate'
'--with-logdir=/var/log/squid' '--with-pidfile=/var/run/squid.pid'
'--with-default-user=squid' '--prefix=/opt/squid' '--enable-auth=basic
digest ntlm negotiate'
'-enable-negotiate-auth-helpers=squid_kerb_auth'
--with-squid=/home/squid/squid-3.1.12 --enable-ltdl-convenience
As ICAP server, we are using McAfee Webwasher 6.9 (old too, I know).
Up until recently we hardly had problems with this environment.
Squid is doing authentication via Kerberos and passing the username to
the Webwasher, which is doing a LDAP lookup to find the users groups
and assign a policy based on group membership.
We have multiple Squids and multiple Webwasher with a hardware
loadbalancer, approx 15k users.
Since a couple of weeks, we almost daily get an ICAP server error
message, similar to:
http://support.kaspersky.com/2723
Unfortunately, I cannot figure out why. In blame the webwasher, but I
am not 100% sure.
This is my ICAP configuration:
#ICAP
icap_enable on
icap_send_client_ip on
icap_send_client_username on
icap_preview_enable on
icap_preview_size 30
icap_uses_indirect_client off
icap_persistent_connections on
icap_client_username_encode on
icap_client_username_header X-Authenticated-User
icap_service service_req reqmod_precache bypass=0
icap://10.122.125.48:1344/wwreqmod
adaptation_access service_req deny favicon
adaptation_access service_req deny to_localhost
adaptation_access service_req deny from_localnet
adaptation_access service_req deny whitelist
adaptation_access service_req deny dst_whitelist
adaptation_access service_req deny icap_bypass_src
adaptation_access service_req deny icap_bypass_dst
adaptation_access service_req allow all
icap_service service_resp respmod_precache bypass=0
icap://10.122.125.48:1344/wwrespmod
adaptation_access service_resp deny favicon
adaptation_access service_resp deny to_localhost
adaptation_access service_resp deny from_localnet
adaptation_access service_resp deny whitelist
adaptation_access service_resp deny dst_whitelist
adaptation_access service_resp deny icap_bypass_src
adaptation_access service_resp deny icap_bypass_dst
adaptation_access service_resp allow all
Could an upgrade (either to 3.2 or to 3.3) solve this problem (There
are more icap options in recent squid versions available)?
Unfortunately, this is a rather complex organisational process, that's
why I did not do that yet.
I do have a test machine, but this ICAP error is not reproducible,
only in production. Server load and IO-througput are ok, there is
nothing suspicious on the server. I recently activated icap debug
option 93 and found following message:
2013/06/12 15:32:15| suspending ICAP service for too many failures
2013/06/12 15:32:15| essential ICAP service is suspended:
icap://10.122.125.48:1344/wwrespmod [down,susp,fail11]
2013/06/12 15:35:15| essential ICAP service is up:
icap://10.122.125.48:1344/wwreqmod [up]
2013/06/12 15:35:15| essential ICAP service is up:
icap://10.122.125.48:1344/wwrespmod [up]
I don't know why this check failed, but it usually does not occur when
clients are getting the icap protocol error page.
Another possibility would be the ICAP bypass, but our ICAP server is
doing anti-Malware-checking and that's why I don't want to activate
this feature.
Does anybody have other ideas?
Thanks!
Peter