Re: squid-3.3.5 hangs the entire system

Amos Jeffries <squid3@xxxxxxxxxxxxx> · Fri, 05 Jul 2013 16:44:44 +1200

On 5/07/2013 3:56 a.m., Grant wrote:
Are your file descriptors limit correctly configured ? Let squid have 6
descriptors per client minimum to be cool.
Is squid limit under user limit ?
I have:

# ulimit -n
1024

And even as latency skyrockets I only have:

# lsof -u squid | wc -l
80

Is it a problem that the client is behind a router which I have no
control over?  Does squid need to establish an inbound connection to
the client?
No, and no.

Is your Squid using the kqueue I/O module of Squid by chance? there is a SSL
hanging bug in there since forever which only just got fixed in 3.3.6.
I updated to 3.3.6 and the system doesn't become totally unresponsive
any more, although SSH latency is still pretty high when the client is
trying to load a page.  The browser still hangs after loading a few
page elements on some websites (www.google.com/nexus/) but now if I
let the page load for long enough, it does eventually load, but it can
take 5 minutes or longer.  Restarting squid sometimes makes it load a
lot faster.  It's possible that it hangs more often when loading
elements from a different domain or subdomain (services.google.com,
doubleclick.net) but that could be a coincidence.  The client's and
server's internet connections are strong.

If that were related it might be DNS or TCP congestion (ECT, Window 
Scaling, MTU) issues.

The usual cause of these type of issues is forwarding loops, although your
low of socket usage indicates that is probably not the problem.
Yes, I'm the only user.

It might be related to the 10ms select-loop delays in Squid. If you load 
the proxy with a bunch more requests (say 20 in parallel constantly) 
does it still happen?

It may also be the machine virtual memory swapping. Each transaction through
Squid utilizes around 256KB of RAM, plus the cache_mem size, plus about
10-15MB of cache index data for each GB of the total cache size.
I have this on the squid system while the browser seems to hang so I
think there is plenty of available physical RAM:

# free
              total       used       free     shared    buffers     cached
Mem:       1985944    1638368     347576          0     838340     219332
-/+ buffers/cache:     580696    1405248
Swap:      1048572          0    1048572

If you are able to replicate it easily, please try to locate an strace or a
cache.log at level-9 to see what Squid is doing during the unresponsive
period.
cache.log outputs a massive amount of data in "ALL,9" but most of the
time it is this stuff:

2013/07/04 08:51:04.143 kid1| event.cc(250) checkEvents: checkEvents
2013/07/04 08:51:04.143 kid1| AsyncCall.cc(18) AsyncCall: The
AsyncCall MaintainSwapSpace constructed, this=0xbb3310 [call546]
2013/07/04 08:51:04.143 kid1| AsyncCall.cc(85) ScheduleCall:
event.cc(259) will call MaintainSwapSpace() [call546]

These happening at regular but widely separated intervals? or lots 
across the slowdown period?
 This is the main cache garbage collection operations, so should be 
checked and purge some things every so often. If they happen unuslally 
frequently during the slow-down period it means the cache is overflowing 
and CPU is busy purging contents until enough space is available for the 
new traffic.

The rest are normal operationals. So much for the hope of some specific 
AsyncCall stuck in a fast repeat.

Amos