On 5/07/2013 3:56 a.m., Grant wrote:
Are your file descriptors limit correctly configured ? Let squid have 6
descriptors per client minimum to be cool.
Is squid limit under user limit ?
I have:
# ulimit -n
1024
And even as latency skyrockets I only have:
# lsof -u squid | wc -l
80
Is it a problem that the client is behind a router which I have no
control over? Does squid need to establish an inbound connection to
the client?
No, and no.
Is your Squid using the kqueue I/O module of Squid by chance? there is a SSL
hanging bug in there since forever which only just got fixed in 3.3.6.
I updated to 3.3.6 and the system doesn't become totally unresponsive
any more, although SSH latency is still pretty high when the client is
trying to load a page. The browser still hangs after loading a few
page elements on some websites (www.google.com/nexus/) but now if I
let the page load for long enough, it does eventually load, but it can
take 5 minutes or longer. Restarting squid sometimes makes it load a
lot faster. It's possible that it hangs more often when loading
elements from a different domain or subdomain (services.google.com,
doubleclick.net) but that could be a coincidence. The client's and
server's internet connections are strong.
If that were related it might be DNS or TCP congestion (ECT, Window
Scaling, MTU) issues.
The usual cause of these type of issues is forwarding loops, although your
low of socket usage indicates that is probably not the problem.
Yes, I'm the only user.
It might be related to the 10ms select-loop delays in Squid. If you load
the proxy with a bunch more requests (say 20 in parallel constantly)
does it still happen?
It may also be the machine virtual memory swapping. Each transaction through
Squid utilizes around 256KB of RAM, plus the cache_mem size, plus about
10-15MB of cache index data for each GB of the total cache size.
I have this on the squid system while the browser seems to hang so I
think there is plenty of available physical RAM:
# free
total used free shared buffers cached
Mem: 1985944 1638368 347576 0 838340 219332
-/+ buffers/cache: 580696 1405248
Swap: 1048572 0 1048572
If you are able to replicate it easily, please try to locate an strace or a
cache.log at level-9 to see what Squid is doing during the unresponsive
period.
cache.log outputs a massive amount of data in "ALL,9" but most of the
time it is this stuff:
2013/07/04 08:51:04.143 kid1| event.cc(250) checkEvents: checkEvents
2013/07/04 08:51:04.143 kid1| AsyncCall.cc(18) AsyncCall: The
AsyncCall MaintainSwapSpace constructed, this=0xbb3310 [call546]
2013/07/04 08:51:04.143 kid1| AsyncCall.cc(85) ScheduleCall:
event.cc(259) will call MaintainSwapSpace() [call546]
These happening at regular but widely separated intervals? or lots
across the slowdown period?
This is the main cache garbage collection operations, so should be
checked and purge some things every so often. If they happen unuslally
frequently during the slow-down period it means the cache is overflowing
and CPU is busy purging contents until enough space is available for the
new traffic.
The rest are normal operationals. So much for the hope of some specific
AsyncCall stuck in a fast repeat.
Amos