Re: Re: Squid Ldap Authenticators

Amos Jeffries <squid3@xxxxxxxxxxxxx> · Wed, 14 Mar 2012 15:15:06 +1300

On 14.03.2012 03:54, guest01 wrote:
Hi,

Sorry, I pressed the send button by mistake ...

We are having strange Squid troubles, at first, let me describe our 
setup:

- 4 HP G6/G7 DL380 servers with 16CPUs and 28GB RAM with RHEL 5.4-5.8
64bit and Squid 3.1.12 (custom compiled)
Squid Cache: Version 3.1.12
configure options:  '--enable-ssl' '--enable-icap-client'
'--sysconfdir=/etc/squid' '--enable-async-io' '--enable-snmp'
'--enable-poll' '--with-maxfd=32768' '--enable-storeio=aufs'
'--enable-removal-policies=heap,lru' '--enable-epoll'
'--disable-ident-lookups' '--enable-truncate'
'--with-logdir=/var/log/squid' '--with-pidfile=/var/run/squid.pid'
'--with-default-user=squid' '--prefix=/opt/squid' 
'--enable-auth=basic
digest ntlm negotiate'
'-enable-negotiate-auth-helpers=squid_kerb_auth'
--with-squid=/home/squid/squid-3.1.12 --enable-ltdl-convenience

- Each server has two instances for kerberos/ntlm authentication and
two instances for LDAP authentication (different customers)
- we have a hardware loadbalancer which is balancing request for our
kerberos-customers (4x2 instances) and ldap-customers (4x2 
instances),
each has a different IP address.
- average load values are approx 0.5 (5min values)
- approx 60RPS per instance (equally distributed -> 16 * 60 => 960 
RPS)
- up to 150Mbit/s traffic per server
- ICAP servers for content adaption (multiple servers with a hardware
loadbalancer in front of it)

From time to time we are having troubles with our Squid servers which
may not be a problem related to Squid, I suspect an OS issue.
Nevertheless, sometimes the servers don't respond to request (even
SSH-requests) or logging in takes forever (reverse lookup failure?) 
or
even worse, sometimes the server interface is just down (there is no
indication of any problem at the switch port level). If we check the
squidclient output, we can see some hanging ldap authenticators:

squid@xlsqit01 /opt/squid/bin $ ./squidclient -h 10.122.125.23
cache_object://10.122.125.23/basicauthenticator
HTTP/1.0 200 OK
Server: squid/3.1.12
Mime-Version: 1.0
Date: Tue, 13 Mar 2012 13:34:07 GMT
Content-Type: text/plain
Expires: Tue, 13 Mar 2012 13:34:07 GMT
Last-Modified: Tue, 13 Mar 2012 13:34:07 GMT
X-Cache: MISS from xlsqip02_3
Via: 1.0 xlsqip02_3 (squid/3.1.12)
Connection: close

Basic Authenticator Statistics:
program: /opt/squid/libexec/squid_ldap_auth
number active: 20 of 20 (0 shutting down)
requests sent: 13316
replies received: 13312
queue length: 0
avg service time: 4741 msec

      #      FD     PID  # Requests     Flags      Time  Offset 
Request
      1      12   16038        2150     B       125.885       0 user1 
pw1\n
      2      24   16043          85     B       119.562       0 user2 
pw2\n
      3      32   16049          63     B        13.639       0 user3 
pw3\n
      4      43   16055          21     B       116.143       0 user4 
pw4\n
      5      46   16059          12             189.002       0 
(none)
      6      50   16064           1             189.003       0 
(none)
      7      56   16069           2               0.079       0 
(none)
      8      60   16074           0               0.000       0 
(none)
      9      65   16079           0               0.000       0 
(none)
     10      86   16084           0               0.000       0 
(none)
     11      88   16095           0               0.000       0 
(none)
     12      90   16101           0               0.000       0 
(none)
     13      92   16117           0               0.000       0 
(none)
     14      95   16122           0               0.000       0 
(none)
     15      97   16130           0               0.000       0 
(none)
     16      99   16138           0               0.000       0 
(none)
     17     101   16144           0               0.000       0 
(none)
     18     104   16150           0               0.000       0 
(none)
     19     107   16162           0               0.000       0 
(none)
     20     109   16173           0               0.000       0 
(none)

Looks like you can save some resources by dropping that down to 10 
helpers. But re-evaluate that after they are fixed in case the loading 
goes up after that.

Flags key:

   B = BUSY
   W = WRITING
   C = CLOSING
   S = SHUTDOWN PENDING

2012/03/13 03:00:04| Ready to serve requests.
squid_ldap_auth: WARNING, could not bind to binddn 'Can't contact
LDAP server'

squid_ldap_auth: WARNING, could not bind to binddn 'Can't contact
LDAP server'
squid_ldap_auth: WARNING, could not bind to binddn 'Can't contact
LDAP server'
squid_ldap_auth: WARNING, could not bind to binddn 'Can't contact
LDAP server'
squid_ldap_auth: WARNING, could not bind to binddn 'Can't contact
LDAP server'
squid_ldap_auth: WARNING, could not bind to binddn 'Can't contact
LDAP server'
squid_ldap_auth: WARNING, could not bind to binddn 'Can't contact
LDAP server'
squid_ldap_auth: WARNING, could not bind to binddn 'Can't contact
LDAP server'
squid_ldap_auth: WARNING, could not bind to binddn 'Can't contact
LDAP server'
squid_ldap_auth: WARNING, could not bind to binddn 'Can't contact
LDAP server'

Testing the ldap authentication at CLI level, it is working without
any problems:

root@xlsqip02 ~ #  /opt/squid/libexec/squid_ldap_auth -b
"dc=squid-proxy" -D "uid=...." -w xxx -h ldaphost -f "(uid=%s)"
user1 pw1
OK

Unfortunately, there is nothing helpful in syslog, e.g.
Mar 13 15:05:19 xlsqip02 last message repeated 2 times
Mar 13 15:05:25 xlsqip02 winbindd[4283]: [2012/03/13 15:05:25, 0]
libsmb/clientgen.c:cli_receive_smb(111)
Mar 13 15:05:25 xlsqip02 winbindd[4283]:   Receiving SMB: Server
stopped responding
Mar 13 15:05:25 xlsqip02 winbindd[4283]: [2012/03/13 15:05:25, 0]
rpc_client/cli_pipe.c:rpc_api_pipe(790)
Mar 13 15:05:25 xlsqip02 winbindd[4283]:   rpc_api_pipe: Remote
machine wienroot1.wien.rbgat.net pipe \lsarpc fnum 0x4008returned
critical error. Error was Call timed out: server did not respond 
after
10000 milliseconds

What does the domain "wienroot1.wien.rbgat.net" resolve to?
 Is connectivity to all its IPs working?

Looks a lot like network congestion affecting SMB. Or possibly route 
up/down connectivity issues for IP (v4? v6?).

Winbind has some nasty limitations, but should not be hitting this type 
of problem.

Mar 13 15:05:48 xlsqip02 sockd[4235]: warning: accept(2) failed:
Resource temporarily unavailable (errno = 11)
Mar 13 15:06:20 xlsqip02 last message repeated 7 times
Mar 13 15:07:26 xlsqip02 last message repeated 4 times
Mar 13 15:08:27 xlsqip02 last message repeated 4 times
Mar 13 15:09:30 xlsqip02 last message repeated 10 times
Mar 13 15:10:37 xlsqip02 last message repeated 7 times
Mar 13 15:11:39 xlsqip02 last message repeated 11 times
Mar 13 15:12:55 xlsqip02 last message repeated 9 times
Mar 13 15:12:57 xlsqip02 winbindd[4331]: [2012/03/13 15:12:57, 0]
libsmb/credentials.c:creds_client_check(324)
Mar 13 15:12:57 xlsqip02 winbindd[4331]:   creds_client_check:
credentials check failed.
Mar 13 15:12:57 xlsqip02 winbindd[4331]: [2012/03/13 15:12:57, 0]
rpc_client/cli_netlogon.c:rpccli_netlogon_sam_network_logon(1030)
Mar 13 15:12:57 xlsqip02 winbindd[4331]:
rpccli_netlogon_sam_network_logon: credentials chain check failed
Mar 13 15:13:05 xlsqip02 sockd[4235]: warning: accept(2) failed:
Resource temporarily unavailable (errno = 11)

btw, winbind just sucks ... But I doubt that winbind is the root 
cause ...

Right. Something underneath it is. Affecting both winbind and 
squid_ldap_auth connectivity. Possibly routing related.

Anyway, we had some NIC issues before (packet drops), at the moment 
we
disabled all TSO-stuff

root@xlsqip02 ~ # ethtool -k eth0
Offload parameters for eth0:
Cannot get device udp large send offload settings: Operation not 
supported
rx-checksumming: off
tx-checksumming: off
scatter-gather: off
tcp segmentation offload: off
udp fragmentation offload: off
generic segmentation offload: off
generic-receive-offload: off

root@xlsqip02 ~ # ethtool -i eth0
driver: bnx2
version: 1.9.3
firmware-version: 4.6.4 NCSI 1.0.3
bus-info: 0000:02:00.0

root@xlsqip02 ~ # ethtool -g eth0
Ring parameters for eth0:
Pre-set maximums:
RX:             1020
RX Mini:        0
RX Jumbo:       4080
TX:             255
Current hardware settings:
RX:             1020
RX Mini:        0
RX Jumbo:       0
TX:             255

netstat output, if interesting:
root@xlsqip02 ~ # netstat -s
Ip:
    1031106057 total packets received
    32 with invalid addresses
    0 forwarded
    0 incoming packets discarded
    1031105815 incoming packets delivered
    943692708 requests sent out
    214 dropped because of missing route

Possibly related.

    34 reassemblies required
    17 packets reassembled ok
Icmp:
    77877 ICMP messages received
    339 input ICMP message failed.
    ICMP input histogram:
        destination unreachable: 31124

unreachable is way too high. The NIC is either going down 
intermittently or a route has disappeared for some destinations.

        timeout in transit: 3011
        echo requests: 43271
        echo replies: 467
    43804 ICMP messages sent
    0 ICMP messages failed
    ICMP output histogram:
        destination unreachable: 66
        echo request: 467
        echo replies: 43271

Amos