I'm trying to debug an intermittent slowness problem with Squid 3.4.4.
Unfortunately I haven't been able to figure out how to reproduce the
problem, it just occurs every so often on a production server. I've not
yet tried Squid 3.4.5, but there's nothing in the change log that would
lead me to believe that this problem has been addressed.
I've got an example web fetch from this morning:
The network traffic shows:
09:01:54.489515 client -> server TCP SYN
09:01:54.489541 client <- server TCP SYN/ACK
09:01:54.489555 client -> server TCP ACK
09:01:54.490059 client -> server HTTP GET request
09:01:54.490074 client <- server TCP ACK
09:02:09.492576 client -> server TCP FIN (client times out, tears down)
09:02:09.532222 client <- server TCP ACK
09:02:35.371911 client <- server TCP FIN (server tears down connection)
The client is port 58469, the server is 3128.
As you can see, Squid never replies to the GET request (and actually, in
this case the GET request didn't require Squid to contact another server
- the authentication credentials were invalid, so it should have
produced a 407).
Examining the Squid logs
(http://persephone.nexusuk.org/~steve/cache.log.trimmed), I can see that
Squid didn't accept the connection until 09:02:35.370
What seems to be happening is that helperStatefulHandleRead is being
called, and taking several seconds to complete - if this happens
frequently enough then the incoming connections get queued up and
significantly delayed. See the log below:
2014/05/08 09:01:53.489 kid1| comm.cc(167) comm_read: comm_read,
queueing read for local=[::] remote=[::] FD 66 flags=1; asynCall
0x7f9c174c4c00*1
2014/05/08 09:01:53.489 kid1| ModEpoll.cc(139) SetSelect: FD 66, type=1,
handler=1, client_data=0x7f9bf4425328, timeout=0
2014/05/08 09:01:53.489 kid1| AsyncCallQueue.cc(53) fireNext: leaving
helperHandleRead(local=[::] remote=[::] FD 66 flags=1,
data=0x7f9c0a1dad78, size=120, buf=0x7f9c09f4d8a0)
2014/05/08 09:01:53.489 kid1| AsyncCallQueue.cc(51) fireNext: entering
helperStatefulHandleRead(local=[::] remote=[::] FD 53 flags=1,
data=0x7f9c08ae5928, size=281, buf=0x7f9bfa3e11d0)
2014/05/08 09:01:53.489 kid1| AsyncCall.cc(30) make: make call
helperStatefulHandleRead [call44614375]
2014/05/08 09:01:58.329 kid1| AsyncCall.cc(18) AsyncCall: The AsyncCall
helperDispatchWriteDone constructed, this=0x7f9c1a39bd40 [call44614625]
2014/05/08 09:01:58.329 kid1| Write.cc(29) Write: local=[::] remote=[::]
FD 68 flags=1: sz 188: asynCall 0x7f9c1a39bd40*1
FDs 66 and 68 are connections to external ACL helpers, but the network
traffic shows that the external ACLs are answering immediately, so as
far as I can tell this delay isn't caused by the helpers themselves.
(FD 66 received a query at 09:01:53.480992 and replied at
09:01:53.484808; FD 68 received a query at 09:01:58.334608 and replied
at 09:01:58.334661).
I *think* FD 53 might be a connection to the ICAP service.
I have noticed that squid often seems to use a lot of CPU time when this
problem is occurring.
Unfortunately I don't know where to go with the debugging now - The
current amount of debug logging produces a lot of data, but isn't really
detailed enough for me to work out what's going on. But as mentioned,
since I can't reproduce this problem in a test environment, I have no
choice but to just leave debug logging turned on on a production server.
Any suggestions / help from people more familiar with the Squid
internals would certainly be helpful.
--
- Steve Hill
Technical Director
Opendium Limited http://www.opendium.com
Direct contacts:
Instant messager: xmpp:steve@xxxxxxxxxxxx
Email: steve@xxxxxxxxxxxx
Phone: sip:steve@xxxxxxxxxxxx
Sales / enquiries contacts:
Email: sales@xxxxxxxxxxxx
Phone: +44-844-9791439 / sip:sales@xxxxxxxxxxxx
Support contacts:
Email: support@xxxxxxxxxxxx
Phone: +44-844-4844916 / sip:support@xxxxxxxxxxxx