Re: ICAP protocol error

guest01 <guest01@xxxxxxxxx> · Thu, 13 Jun 2013 16:22:08 +0200

Hi,

Thanks for your answers.

At the moment, we have 4 "monster"-servers, no indication of any
performance issues. (there is an extensive munin monitoring)

TCP-states: http://prntscr.com/19qle2
CPU: http://prntscr.com/19qltm
Load: http://prntscr.com/19qlwe
Vmstat: http://prntscr.com/19qm3v
Bandwidth: http://prntscr.com/19qmc4

We have 4 squid instances per server and 4 servers, handling all
together approx 2000rps without harddisc-caching. Half of them is
doing kerberos authentication and the other half is doing LDAP
authentication. Content scanning is done by a couple (6 at the moment)
of webwasher appliances. These are my cache settings per instance:
# cache specific settings
cache_replacement_policy heap LFUDA
cache_mem 1600 MB
memory_replacement_policy heap LFUDA
maximum_object_size_in_memory 2048 KB
memory_pools off
cache_swap_low 85
cache_swap_high 90

My plan is to adjust a couple of icap timers and increase icap
debugging to 93,4 or 93,5) I found these messages:
2013/06/13 03:49:42| essential ICAP service is down after an options
fetch failure: icap://10.122.125.48:1344/wwreqmod [down,!opt]
2013/06/13 11:09:33.530| essential ICAP service is suspended:
icap://10.122.125.48:1344/wwreqmod [down,susp,fail11]

What does down,!opt or down,susp,fail11 mean?

thanks!
Peter

On Thu, Jun 13, 2013 at 2:41 AM, Eliezer Croitoru <eliezer@xxxxxxxxxxxx> wrote:
> Hey,
>
> There was a bug that is related to LOAD on a server.
> your server is a monster!!
> squid 3.1.12 cannot even use the ammount of CPU you have on this machine as
> far as I can tell from my knowledge unless you have couple clever ideas in
> your sleeve.(routing marking etc..)
>
> To make sure what the problem is I would recommend also to verify the load
> on the server in a manner of open and half open sessions\connections to
> squid and icap service\server.
> Are you using this squid server for filtering only? or also cache?
> if so what is the cache size?
>
> The above questions can help us determine your situation and try to help you
> verify that the culprit is a specific bug that from my testings on 3.3.5
> doesn't exists anymore.
> if you are up for the task to verify the loads on the server I can tell you
> it's a 90% go on the bug.
> What I had was a problem when squid was going over the 900 RPS the ICAP
> service would go into a mode which stopped responding to requests.(and
> showed the mentioned screen)
> This bug was tested on a very slow machine compared to yours.
> On a monster like yours this effect that I have tested might not appear with
> the same side effects like "denial of service"  but rather "interruption of
> service" which your monster recover very quickly from.
>
> I'm here if you need any assistance,
> Eliezer
>
>
> On 6/12/2013 4:57 PM, guest01 wrote:
>>
>> Hi guys,
>>
>> We are currently using Squid 3.1.12 (old, I know) on RHEL 5.8 64bit
>> (HP ProLiant DL380 G7 with 16 CPU and 28GB RAM)
>> Squid Cache: Version 3.1.12
>> configure options:  '--enable-ssl' '--enable-icap-client'
>> '--sysconfdir=/etc/squid' '--enable-async-io' '--enable-snmp'
>> '--enable-poll' '--with-maxfd=32768' '--enable-storeio=aufs'
>> '--enable-removal-policies=heap,lru' '--enable-epoll'
>> '--disable-ident-lookups' '--enable-truncate'
>> '--with-logdir=/var/log/squid' '--with-pidfile=/var/run/squid.pid'
>> '--with-default-user=squid' '--prefix=/opt/squid' '--enable-auth=basic
>> digest ntlm negotiate'
>> '-enable-negotiate-auth-helpers=squid_kerb_auth'
>> --with-squid=/home/squid/squid-3.1.12 --enable-ltdl-convenience
>>
>> As ICAP server, we are using McAfee Webwasher 6.9 (old too, I know).
>> Up until recently we hardly had problems with this environment.
>> Squid is doing authentication via Kerberos and passing the username to
>> the Webwasher, which is doing a LDAP lookup to find the users groups
>> and assign a policy based on group membership.
>> We have multiple Squids and multiple Webwasher with a hardware
>> loadbalancer, approx 15k users.
>>
>> Since a couple of weeks, we almost daily get an ICAP server error
>> message, similar to:
>> http://support.kaspersky.com/2723
>> Unfortunately, I cannot figure out why. In blame the webwasher, but I
>> am not 100% sure.
>>
>> This is my ICAP configuration:
>> #ICAP
>> icap_enable on
>> icap_send_client_ip on
>> icap_send_client_username on
>> icap_preview_enable on
>> icap_preview_size 30
>> icap_uses_indirect_client off
>> icap_persistent_connections on
>> icap_client_username_encode on
>> icap_client_username_header X-Authenticated-User
>> icap_service service_req reqmod_precache bypass=0
>> icap://10.122.125.48:1344/wwreqmod
>> adaptation_access service_req deny favicon
>> adaptation_access service_req deny to_localhost
>> adaptation_access service_req deny from_localnet
>> adaptation_access service_req deny whitelist
>> adaptation_access service_req deny dst_whitelist
>> adaptation_access service_req deny icap_bypass_src
>> adaptation_access service_req deny icap_bypass_dst
>> adaptation_access service_req allow all
>> icap_service service_resp respmod_precache bypass=0
>> icap://10.122.125.48:1344/wwrespmod
>> adaptation_access service_resp deny favicon
>> adaptation_access service_resp deny to_localhost
>> adaptation_access service_resp deny from_localnet
>> adaptation_access service_resp deny whitelist
>> adaptation_access service_resp deny dst_whitelist
>> adaptation_access service_resp deny icap_bypass_src
>> adaptation_access service_resp deny icap_bypass_dst
>> adaptation_access service_resp allow all
>>
>> Could an upgrade (either to 3.2 or to 3.3) solve this problem (There
>> are more icap options in recent squid versions available)?
>> Unfortunately, this is a rather complex organisational process, that's
>> why I did not do that yet.
>> I do have a test machine, but this ICAP error is not reproducible,
>> only in production. Server load and IO-througput are ok, there is
>> nothing suspicious on the server. I recently activated icap debug
>> option 93 and found following message:
>> 2013/06/12 15:32:15| suspending ICAP service for too many failures
>> 2013/06/12 15:32:15| essential ICAP service is suspended:
>> icap://10.122.125.48:1344/wwrespmod [down,susp,fail11]
>> 2013/06/12 15:35:15| essential ICAP service is up:
>> icap://10.122.125.48:1344/wwreqmod [up]
>> 2013/06/12 15:35:15| essential ICAP service is up:
>> icap://10.122.125.48:1344/wwrespmod [up]
>> I don't know why this check failed, but it usually does not occur when
>> clients are getting the icap protocol error page.
>>
>> Another possibility would be the ICAP bypass, but our ICAP server is
>> doing anti-Malware-checking and that's why I don't want to activate
>> this feature.
>>
>> Does anybody have other ideas?
>>
>> Thanks!
>> Peter
>>
>