Search squid archive

Re: Re: Squid Ldap Authenticators

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 14.03.2012 03:54, guest01 wrote:
Hi,

Sorry, I pressed the send button by mistake ...

We are having strange Squid troubles, at first, let me describe our setup:

- 4 HP G6/G7 DL380 servers with 16CPUs and 28GB RAM with RHEL 5.4-5.8
64bit and Squid 3.1.12 (custom compiled)
Squid Cache: Version 3.1.12
configure options:  '--enable-ssl' '--enable-icap-client'
'--sysconfdir=/etc/squid' '--enable-async-io' '--enable-snmp'
'--enable-poll' '--with-maxfd=32768' '--enable-storeio=aufs'
'--enable-removal-policies=heap,lru' '--enable-epoll'
'--disable-ident-lookups' '--enable-truncate'
'--with-logdir=/var/log/squid' '--with-pidfile=/var/run/squid.pid'
'--with-default-user=squid' '--prefix=/opt/squid' '--enable-auth=basic
digest ntlm negotiate'
'-enable-negotiate-auth-helpers=squid_kerb_auth'
--with-squid=/home/squid/squid-3.1.12 --enable-ltdl-convenience

- Each server has two instances for kerberos/ntlm authentication and
two instances for LDAP authentication (different customers)
- we have a hardware loadbalancer which is balancing request for our
kerberos-customers (4x2 instances) and ldap-customers (4x2 instances),
each has a different IP address.
- average load values are approx 0.5 (5min values)
- approx 60RPS per instance (equally distributed -> 16 * 60 => 960 RPS)
- up to 150Mbit/s traffic per server
- ICAP servers for content adaption (multiple servers with a hardware
loadbalancer in front of it)

From time to time we are having troubles with our Squid servers which
may not be a problem related to Squid, I suspect an OS issue.
Nevertheless, sometimes the servers don't respond to request (even
SSH-requests) or logging in takes forever (reverse lookup failure?) or
even worse, sometimes the server interface is just down (there is no
indication of any problem at the switch port level). If we check the
squidclient output, we can see some hanging ldap authenticators:

squid@xlsqit01 /opt/squid/bin $ ./squidclient -h 10.122.125.23
cache_object://10.122.125.23/basicauthenticator
HTTP/1.0 200 OK
Server: squid/3.1.12
Mime-Version: 1.0
Date: Tue, 13 Mar 2012 13:34:07 GMT
Content-Type: text/plain
Expires: Tue, 13 Mar 2012 13:34:07 GMT
Last-Modified: Tue, 13 Mar 2012 13:34:07 GMT
X-Cache: MISS from xlsqip02_3
Via: 1.0 xlsqip02_3 (squid/3.1.12)
Connection: close

Basic Authenticator Statistics:
program: /opt/squid/libexec/squid_ldap_auth
number active: 20 of 20 (0 shutting down)
requests sent: 13316
replies received: 13312
queue length: 0
avg service time: 4741 msec

# FD PID # Requests Flags Time Offset Request 1 12 16038 2150 B 125.885 0 user1 pw1\n 2 24 16043 85 B 119.562 0 user2 pw2\n 3 32 16049 63 B 13.639 0 user3 pw3\n 4 43 16055 21 B 116.143 0 user4 pw4\n 5 46 16059 12 189.002 0 (none) 6 50 16064 1 189.003 0 (none) 7 56 16069 2 0.079 0 (none) 8 60 16074 0 0.000 0 (none) 9 65 16079 0 0.000 0 (none) 10 86 16084 0 0.000 0 (none) 11 88 16095 0 0.000 0 (none) 12 90 16101 0 0.000 0 (none) 13 92 16117 0 0.000 0 (none) 14 95 16122 0 0.000 0 (none) 15 97 16130 0 0.000 0 (none) 16 99 16138 0 0.000 0 (none) 17 101 16144 0 0.000 0 (none) 18 104 16150 0 0.000 0 (none) 19 107 16162 0 0.000 0 (none) 20 109 16173 0 0.000 0 (none)

Looks like you can save some resources by dropping that down to 10 helpers. But re-evaluate that after they are fixed in case the loading goes up after that.


Flags key:

   B = BUSY
   W = WRITING
   C = CLOSING
   S = SHUTDOWN PENDING

2012/03/13 03:00:04| Ready to serve requests.
squid_ldap_auth: WARNING, could not bind to binddn 'Can't contact
LDAP server'


squid_ldap_auth: WARNING, could not bind to binddn 'Can't contact
LDAP server'
squid_ldap_auth: WARNING, could not bind to binddn 'Can't contact
LDAP server'
squid_ldap_auth: WARNING, could not bind to binddn 'Can't contact
LDAP server'
squid_ldap_auth: WARNING, could not bind to binddn 'Can't contact
LDAP server'
squid_ldap_auth: WARNING, could not bind to binddn 'Can't contact
LDAP server'
squid_ldap_auth: WARNING, could not bind to binddn 'Can't contact
LDAP server'
squid_ldap_auth: WARNING, could not bind to binddn 'Can't contact
LDAP server'
squid_ldap_auth: WARNING, could not bind to binddn 'Can't contact
LDAP server'
squid_ldap_auth: WARNING, could not bind to binddn 'Can't contact
LDAP server'

Testing the ldap authentication at CLI level, it is working without
any problems:

root@xlsqip02 ~ #  /opt/squid/libexec/squid_ldap_auth -b
"dc=squid-proxy" -D "uid=...." -w xxx -h ldaphost -f "(uid=%s)"
user1 pw1
OK

Unfortunately, there is nothing helpful in syslog, e.g.
Mar 13 15:05:19 xlsqip02 last message repeated 2 times
Mar 13 15:05:25 xlsqip02 winbindd[4283]: [2012/03/13 15:05:25, 0]
libsmb/clientgen.c:cli_receive_smb(111)
Mar 13 15:05:25 xlsqip02 winbindd[4283]:   Receiving SMB: Server
stopped responding
Mar 13 15:05:25 xlsqip02 winbindd[4283]: [2012/03/13 15:05:25, 0]
rpc_client/cli_pipe.c:rpc_api_pipe(790)
Mar 13 15:05:25 xlsqip02 winbindd[4283]:   rpc_api_pipe: Remote
machine wienroot1.wien.rbgat.net pipe \lsarpc fnum 0x4008returned
critical error. Error was Call timed out: server did not respond after
10000 milliseconds

What does the domain "wienroot1.wien.rbgat.net" resolve to?
 Is connectivity to all its IPs working?

Looks a lot like network congestion affecting SMB. Or possibly route up/down connectivity issues for IP (v4? v6?).

Winbind has some nasty limitations, but should not be hitting this type of problem.


Mar 13 15:05:48 xlsqip02 sockd[4235]: warning: accept(2) failed:
Resource temporarily unavailable (errno = 11)
Mar 13 15:06:20 xlsqip02 last message repeated 7 times
Mar 13 15:07:26 xlsqip02 last message repeated 4 times
Mar 13 15:08:27 xlsqip02 last message repeated 4 times
Mar 13 15:09:30 xlsqip02 last message repeated 10 times
Mar 13 15:10:37 xlsqip02 last message repeated 7 times
Mar 13 15:11:39 xlsqip02 last message repeated 11 times
Mar 13 15:12:55 xlsqip02 last message repeated 9 times
Mar 13 15:12:57 xlsqip02 winbindd[4331]: [2012/03/13 15:12:57, 0]
libsmb/credentials.c:creds_client_check(324)
Mar 13 15:12:57 xlsqip02 winbindd[4331]:   creds_client_check:
credentials check failed.
Mar 13 15:12:57 xlsqip02 winbindd[4331]: [2012/03/13 15:12:57, 0]
rpc_client/cli_netlogon.c:rpccli_netlogon_sam_network_logon(1030)
Mar 13 15:12:57 xlsqip02 winbindd[4331]:
rpccli_netlogon_sam_network_logon: credentials chain check failed
Mar 13 15:13:05 xlsqip02 sockd[4235]: warning: accept(2) failed:
Resource temporarily unavailable (errno = 11)

btw, winbind just sucks ... But I doubt that winbind is the root cause ...

Right. Something underneath it is. Affecting both winbind and squid_ldap_auth connectivity. Possibly routing related.


Anyway, we had some NIC issues before (packet drops), at the moment we
disabled all TSO-stuff

root@xlsqip02 ~ # ethtool -k eth0
Offload parameters for eth0:
Cannot get device udp large send offload settings: Operation not supported
rx-checksumming: off
tx-checksumming: off
scatter-gather: off
tcp segmentation offload: off
udp fragmentation offload: off
generic segmentation offload: off
generic-receive-offload: off

root@xlsqip02 ~ # ethtool -i eth0
driver: bnx2
version: 1.9.3
firmware-version: 4.6.4 NCSI 1.0.3
bus-info: 0000:02:00.0

root@xlsqip02 ~ # ethtool -g eth0
Ring parameters for eth0:
Pre-set maximums:
RX:             1020
RX Mini:        0
RX Jumbo:       4080
TX:             255
Current hardware settings:
RX:             1020
RX Mini:        0
RX Jumbo:       0
TX:             255

netstat output, if interesting:
root@xlsqip02 ~ # netstat -s
Ip:
    1031106057 total packets received
    32 with invalid addresses
    0 forwarded
    0 incoming packets discarded
    1031105815 incoming packets delivered
    943692708 requests sent out
    214 dropped because of missing route

Possibly related.

    34 reassemblies required
    17 packets reassembled ok
Icmp:
    77877 ICMP messages received
    339 input ICMP message failed.
    ICMP input histogram:
        destination unreachable: 31124

unreachable is way too high. The NIC is either going down intermittently or a route has disappeared for some destinations.

        timeout in transit: 3011
        echo requests: 43271
        echo replies: 467
    43804 ICMP messages sent
    0 ICMP messages failed
    ICMP output histogram:
        destination unreachable: 66
        echo request: 467
        echo replies: 43271


Amos


[Index of Archives]     [Linux Audio Users]     [Samba]     [Big List of Linux Books]     [Linux USB]     [Yosemite News]

  Powered by Linux