Intermittent slowness

Steve Hill <steve@xxxxxxxxxxxx> · Thu, 08 May 2014 18:06:09 +0100

I'm trying to debug an intermittent slowness problem with Squid 3.4.4. 
Unfortunately I haven't been able to figure out how to reproduce the 
problem, it just occurs every so often on a production server.  I've not 
yet tried Squid 3.4.5, but there's nothing in the change log that would 
lead me to believe that this problem has been addressed.

I've got an example web fetch from this morning:

The network traffic shows:
09:01:54.489515 client -> server TCP SYN
09:01:54.489541 client <- server TCP SYN/ACK
09:01:54.489555 client -> server TCP ACK
09:01:54.490059 client -> server HTTP GET request
09:01:54.490074 client <- server TCP ACK
09:02:09.492576 client -> server TCP FIN (client times out, tears down)
09:02:09.532222 client <- server TCP ACK
09:02:35.371911 client <- server TCP FIN (server tears down connection)
The client is port 58469, the server is 3128.

As you can see, Squid never replies to the GET request (and actually, in 
this case the GET request didn't require Squid to contact another server 
- the authentication credentials were invalid, so it should have 
produced a 407).

Examining the Squid logs 
(http://persephone.nexusuk.org/~steve/cache.log.trimmed), I can see that 
Squid didn't accept the connection until 09:02:35.370

What seems to be happening is that helperStatefulHandleRead is being 
called, and taking several seconds to complete - if this happens 
frequently enough then the incoming connections get queued up and 
significantly delayed.  See the log below:

2014/05/08 09:01:53.489 kid1| comm.cc(167) comm_read: comm_read, 
queueing read for local=[::] remote=[::] FD 66 flags=1; asynCall 
0x7f9c174c4c00*1
2014/05/08 09:01:53.489 kid1| ModEpoll.cc(139) SetSelect: FD 66, type=1, 
handler=1, client_data=0x7f9bf4425328, timeout=0
2014/05/08 09:01:53.489 kid1| AsyncCallQueue.cc(53) fireNext: leaving 
helperHandleRead(local=[::] remote=[::] FD 66 flags=1, 
data=0x7f9c0a1dad78, size=120, buf=0x7f9c09f4d8a0)
2014/05/08 09:01:53.489 kid1| AsyncCallQueue.cc(51) fireNext: entering 
helperStatefulHandleRead(local=[::] remote=[::] FD 53 flags=1, 
data=0x7f9c08ae5928, size=281, buf=0x7f9bfa3e11d0)
2014/05/08 09:01:53.489 kid1| AsyncCall.cc(30) make: make call 
helperStatefulHandleRead [call44614375]
2014/05/08 09:01:58.329 kid1| AsyncCall.cc(18) AsyncCall: The AsyncCall 
helperDispatchWriteDone constructed, this=0x7f9c1a39bd40 [call44614625]
2014/05/08 09:01:58.329 kid1| Write.cc(29) Write: local=[::] remote=[::] 
FD 68 flags=1: sz 188: asynCall 0x7f9c1a39bd40*1

FDs 66 and 68 are connections to external ACL helpers, but the network 
traffic shows that the external ACLs are answering immediately, so as 
far as I can tell this delay isn't caused by the helpers themselves. 
(FD 66 received a query at 09:01:53.480992 and replied at 
09:01:53.484808; FD 68 received a query at 09:01:58.334608 and replied 
at 09:01:58.334661).

I *think* FD 53 might be a connection to the ICAP service.

I have noticed that squid often seems to use a lot of CPU time when this 
problem is occurring.

Unfortunately I don't know where to go with the debugging now - The 
current amount of debug logging produces a lot of data, but isn't really 
detailed enough for me to work out what's going on.  But as mentioned, 
since I can't reproduce this problem in a test environment, I have no 
choice but to just leave debug logging turned on on a production server.

Any suggestions / help from people more familiar with the Squid 
internals would certainly be helpful.

--
 - Steve Hill
   Technical Director
   Opendium Limited     http://www.opendium.com

Direct contacts:
   Instant messager: xmpp:steve@xxxxxxxxxxxx
   Email:            steve@xxxxxxxxxxxx
   Phone:            sip:steve@xxxxxxxxxxxx

Sales / enquiries contacts:
   Email:            sales@xxxxxxxxxxxx
   Phone:            +44-844-9791439 / sip:sales@xxxxxxxxxxxx

Support contacts:
   Email:            support@xxxxxxxxxxxx
   Phone:            +44-844-4844916 / sip:support@xxxxxxxxxxxx