On Wed, 3 Feb 2010 20:14:36 -0800 (GMT-08:00), Shawn Wright <swright@xxxxxxxxxxxx> wrote: > On Monday, our usually very stable Squid 2.6stable20 server acted up, due > to what appears to be a problem with the ntlm_auth helpers. No changes have > been made to the system recently, with the exception of recompiling to > allow for logs >2Gb last month. Prior to that, the system had been stable > since Mar 2008. The logs below show a FATAL stop after too many queued > auths, which could be due to slow AD DCs combined with spyware on clients > hammering the proxy. But the odd thing is after the restart at 19:36 on Feb > 1, our MRTG snmp stats showed a gradual decline in active clients from a > normal 650 down to 0, over a period of 24 hours. All other snmp stats > appear normal. > > These two events appear to be related, but I'm not sure how or where to > look next. Do the logs over that 24 hr period confirm the loss of client connections? They should at least have clues as to what the last action each client did was. The final interaction between a client and the proxy might be significant if it was repeated many times across the client base. > > cache.log showing the abend: > =========== > > 2010/02/01 19:36:12| WARNING: All basicauthenticator processes are busy. > 2010/02/01 19:36:12| WARNING: up to 16 pending requests queued > 2010/02/01 19:36:12| storeDirWriteCleanLogs: Starting... > 2010/02/01 19:36:12| WARNING: Closing open FD 17 > 2010/02/01 19:36:12| commSetEvents: epoll_ctl(EPOLL_CTL_DEL): failed on > fd=17: (1) Operation not permitted > 2010/02/01 19:36:12| 65536 entries written so far. > 2010/02/01 19:36:12| 131072 entries written so far. > <snip > 2010/02/01 19:36:17| Finished. Wrote 2393562 entries. > 2010/02/01 19:36:17| Took 5.0 seconds (477102.8 entries/sec). > FATAL: Too many queued basicauthenticator requests (16 on 5) Bumping the number of helpers up a bit will raise the ceiling on that happening again. > Squid Cache (Version 2.6.STABLE20): Terminated abnormally. > CPU Usage: 51624.410 seconds = 30518.320 user + 21106.090 sys > Maximum Resident Size: 0 KB > Page faults with physical i/o: 2 > Memory usage for squid via mallinfo(): > total space in arena: 1727844 KB > Ordinary blocks: 1658823 KB 788411 blks > Small blocks: 0 KB 0 blks > Holding blocks: 25956 KB 13 blks > Free Small blocks: 0 KB > Free Ordinary blocks: 69020 KB > Total in use: 1684779 KB 96% > Total free: 69020 KB 4% > [2010/02/01 19:36:18, 1] utils/ntlm_auth.c:manage_squid_request(1594) <snip> > [2010/02/01 19:36:18, 1] utils/ntlm_auth.c:manage_squid_request(1594) > fgets() failed! dying..... errno=1 (Operation not permitted) This is just Squid axing away at the helpers. They were actually busy doing what they are supposed to do without any problem, so they complain when aborted. Not their fault squid did not have enough running to cope with the load. Amos