Re: Re: negotiate_kerberos_auth helpers stay busy

Klaus Walter <klaus.walter@xxxxxx> · Wed, 31 Jul 2013 13:53:35 +0200

Hi,

I am running squid 3.2.1 on CentOS 6.3 with kerberos authentication
using negotiate_kerberos_auth.
Generally this is working fine, but after some time more and more
helper instances stay busy and cannot finish the given request.
Therefore squid starts new helper processes to have enough working
helpers for kerberos authentication.

This release of Squid is quite old now. Are you able to upgrade your
proxy to the current stable release and see if the problem disappears?
(today that would be 3.3.8)
You can find recent versions packages for CentOS at
http://wiki.squid-cache.org/KnowledgeBase/CentOS.

I would like to update squid to version 3.3.8 but there is still
the problem that URLs that contain an ip address are not forwarded
to cache peers (see Bug 3848). Therefore it is not usable for me.

But I tested squid 3.3.8 on my test system and there was exactly the
same problem with kerberos authentication.

This is going on until squid has no more memory for the helpers:

2013/07/30 08:48:04 kid1| Starting new negotiateauthenticator helpers...
2013/07/30 08:48:04 kid1| helperOpenServers: Starting 1/500
'negotiate_kerberos_auth' processes
2013/07/30 08:48:04 kid1| ipcCreate: fork: (12) Cannot allocate memory

That is bad, it is unrelated to the helpers getting locked up though.

How much RAM is the Squid worker process using at the time this appears?
Starting helpers with fork() requires Squid to be allocated virtual
memory 2x that being used at the time by the worker process.

And how much memory is currently in use by each of those 8 BUSY helpers?

Negotiate Authenticator Statistics:
program: /usr/lib64/squid/negotiate_kerberos_auth
number active: 39 of 500 (0 shutting down)
requests sent: 11141
replies received: 11133
queue length: 0
avg service time: 4 msec

      #      FD     PID  # Requests      Flags     Time  Offset Request
      1      19   31373         753     B R 3887.019              0
      1      37   31390         755     B R 3637.061              0
      1      39   31391        2539     B R 2053.518              0
      1      41   31392          78     B R 3859.365              0
      1      43   31393         807     B R 2008.036              0
      1      57   31396         415     B R 2003.899              0
      1      63   31397         363     B R 1975.126              0
      1      95   31401         329     B R 1944.980              0
      1      29   31491        1891               0.009       0 (none)
      1      77   31492         813               0.011       0 (none)
      1      88   31493         578               0.009       0 (none)

The first eight helper processes are busy and will never return to
normal state until squid is restarted.
Gradually more and more helpers stay in busy state.

strace shows me that this helpers are blocked during a read-command:
....
read(0, "r", 1)                         = 1
read(0, "r", 1)                         = 1
read(0, "7", 1)                         = 1
read(0, "+", 1)                         = 1
read(0, "a", 1)                         = 1
read(0, "G", 1)                         = 1
read(0,  <unfinished ...>

After this the process is never continued.

That does not look blocked to me. The value arriving is changing, just
vvvveeerrrryyyy ssslloowwwlllyyy, one byte per I/O cycle to be exact.
Since kerberos credentials can be up to 32 KB large its easy to see why
they are stuck in BUSY state for such long times.

But Microsoft says that there can be kerberos credentials up to 64 KB
(http://support.microsoft.com/kb/327825/en-us).
And I think that's the problem. Our Active Directory has actually kerberos
credentials up to 40 KB and I found out that exactly these credentials
put the kerberos helper in the busy state.

strace shows me that the helper process reads credential bytes from
squid up to 32 KB then it waits at the socket for the rest and stays busy,
but squid doesn't transmit any bytes more than 32 KB. Therefore the
helper never finishes the authentication and is no longer usable.

Users with kerberos credentials greater than 32 k cannot use squid and
cause blocked helper processes.

Is there a solution for such long kerberos credentials?

I cannot find any error messages in cache.log even if I switch on
debugging at the helper.

At this rate of I/O in the helper it is unlikely that they will be able
to send a message to cache.log in any reasonable time.

Thank you for help!

Klaus