On 17/05/2016 6:27 a.m., Eugene M. Zheganin wrote: > Hi. > I dont see any mention of the Squid version. Which one are you having this issue in? > I'm using squid for a long time, I'm using it to authenticate/authorize > users accessing the Internet with LDAP in a Windows corporate > enviromnent (Basic/NTLM/GSS-SPNEGO) and recently (about several months > ago) I had to switch to the SMP scheme, because one process started to > eat the whole core sometimes, thus bottlenecking users on it. This might be a version-specific problem. We've had a few bugs solved that could match that description. > Situation > with CPU effectiveness improved, however I discovered several issues. > The first I was aware of, it's the non-functional SNMP (since there's no > solution, I just had to sacrifice it). Do you mean its fully non-functional? Or that you are just getting randomly different responses from different workers when they share an SNMP receiving port? That latter is worked around by configuring per-worker SNMP ports and querying each individually for its details. > But the second one is more > disturbing. I discovered that after a several uptime (usually couple of > weeks, a month at it's best) squid somehow degrades and stops > authorizing users. Which auth scheme are those users using? > I have about active 600 users on my biggest site > (withount SNMP I'm not sure how many simultaneous users I got) but The mgr:client_db report can help give a good ballpark number there if you have it enabled. > usually this starts like this: someone (this starts with one person) > complains that he lost his access to the internet - not entirely, no. At > first the access is very slow, and the victim has to wait several > minutes for the page to load. Others are unaffected at this time. From > time to time the victim is able to load one of two tabs in the browser, > eventually, but at the end of the day this becomes unuseable, and my > support has to come in. Then this gots escalated to me. First I was > debugging various kerberos stuff, NTLM, victim's machine domain > membership and so on. But today I managed to figure out that all I have > to do is just restart squid, yeah (sounds silly, but I don't like to > restart things, like in the "IT Crowd" TV Series, this is kinda last > resort measure, when I'm desperate). That could be either one of four bugs I'm aware of: 1) NTLM connection limit to AD. Winbind access to AD cannot make more than concurrent 256 connections to any given AD. Thats aggregate across all the NTLM + Negotiate helpers and any other proceses also running on the Squid machine. This can result in an ever growing queue of pending auth requests until the proxy is treading water just trying to catch up on which clients have not yet disconnected. 2) NTLM helper limits exceeded. NTLM handshake duration is not limited. If for any reason it pauses for a long time between the multiple HTTP requests involved, that helper is blocked from use by any other users. This can result in both an ever growing queue, and ever fewer helpers available to service that queue. Don't you just love NTLM? 3) NTLM and Negotiate involve the helper passing Squid a unique token with every HTTP request made on an new connection. The annotations feature in Squid for quite a few releases was adding these to each username's auth state. The number of these unique token Notes could build up over a few hours to a day or two depending on the clients activity rate - to a number big enough to cause noticable delays on every request they made, and others. 4) Recent versions of Firefox are known to begin NTLM handshakes badly. They work find for Kerberos handshakes, and sometimes for NTLM. But for certain requests they advertise keep-alive on the type-1 message then just hang. Fortunately this is a behaviour seen with MSIE 5.x back in the day, so the auth_param "keepalive off" setting is already available to resolve that. Though it does mean the NTLM handshakes require a TCP teardown and reconnect, which can make issue (2) above hurt more. > If I'm stubborn enough to continue > the investigation, soon I got 2 users complaining, then 3, then more. > During previous outages eventually I used to restart squid (to change > the domain controller in kerberos config, if I blame one; to disable the > external Kerberos/LDAP helper connection pooling, if I blame one) - so > each time there was a candidate to blame. But this time I just decided > to restart squid, since I started to think it's the main reason, et > voila. I should also mention that I run this AAA scheme in squid for > years, and I didn't have this issue previously. Keep in mind that if you have been keeping up with important patches/updates to Squid AD and/or Samba. Or just client OS updates. Then a lot of things have been changing from all sides of the process across those years. > I also have like dozen > of other squids running same (very similar) config, - same AAA stuff - > Basic/NTLM/GSS-SpNego, same AD group checking, but only for the > different groups membership - and none of it has this issue. I'm > thinking there's SMP involved, really. Maybe. Each worker does its own auth, with no sharing. So they should be operating same as if they were different instances which happened to have identical config. That itself can make problem (1) happen as the Winbind count multiplies by the number of workers. Other than that each TCP connection might end up going to a different worker. BUT, re-auth is always needed on new TCP connections anyway. So if the client is using HTTP properly that should not be causing any issue. Might be a big "IF" there though. I have to keep reassuring myself that NTLM can handle the TCP re-connect going to a different worker. The bits prior to type-1 handshake doesn't need a helper, so it should not have issues, but Im not completely confident about it. > > I realize this is a poor problem report. "Something degrades, I restart > squid, please help, I think it's SMP-related". But the thing is - I > don't know where to start to narrow this stuff. If anyone's having a > good idea please let me know. The above might give you ideas. Otherwise I can only suggest turning on debug for the authentication section and see if anything odd shows up. debug_options ALl,1 29,4 Amos _______________________________________________ squid-users mailing list squid-users@xxxxxxxxxxxxxxxxxxxxx http://lists.squid-cache.org/listinfo/squid-users