Re: squid-3.3.5 hangs the entire system

Grant <emailgrant@xxxxxxxxx> · Fri, 5 Jul 2013 00:33:05 -0700

>> I updated to 3.3.6 and the system doesn't become totally unresponsive
>> any more, although SSH latency is still pretty high when the client is
>> trying to load a page.  The browser still hangs after loading a few
>> page elements on some websites (www.google.com/nexus/) but now if I
>> let the page load for long enough, it does eventually load, but it can
>> take 5 minutes or longer.  Restarting squid sometimes makes it load a
>> lot faster.  It's possible that it hangs more often when loading
>> elements from a different domain or subdomain (services.google.com,
>> doubleclick.net) but that could be a coincidence.  The client's and
>> server's internet connections are strong.
>
> If that were related it might be DNS or TCP congestion (ECT, Window Scaling,
> MTU) issues.

I set the following on the squid server and client with no noticeable change:

echo 0 > /proc/sys/net/ipv4/tcp_ecn
echo 1 > /proc/sys/net/ipv4/ip_no_pmtu_disc
echo 0 > /proc/sys/net/ipv4/tcp_window_scaling

>>> The usual cause of these type of issues is forwarding loops, although
>>> your
>>> low of socket usage indicates that is probably not the problem.
>>
>> Yes, I'm the only user.
>
> It might be related to the 10ms select-loop delays in Squid. If you load the
> proxy with a bunch more requests (say 20 in parallel constantly) does it
> still happen?

I opened 20 tabs in firefox and the 3 tabs which started loading first
loaded slightly more content than usual.

>> I have this on the squid system while the browser seems to hang so I
>> think there is plenty of available physical RAM:
>>
>> # free
>>               total       used       free     shared    buffers     cached
>> Mem:       1985944    1638368     347576          0     838340     219332
>> -/+ buffers/cache:     580696    1405248
>> Swap:      1048572          0    1048572
>>
>> 2013/07/04 08:51:04.143 kid1| event.cc(250) checkEvents: checkEvents
>> 2013/07/04 08:51:04.143 kid1| AsyncCall.cc(18) AsyncCall: The
>> AsyncCall MaintainSwapSpace constructed, this=0xbb3310 [call546]
>> 2013/07/04 08:51:04.143 kid1| AsyncCall.cc(85) ScheduleCall:
>> event.cc(259) will call MaintainSwapSpace() [call546]
>
> These happening at regular but widely separated intervals? or lots across
> the slowdown period?

They happen about once per second during the slowdown.

>  This is the main cache garbage collection operations, so should be checked
> and purge some things every so often. If they happen unuslally frequently
> during the slow-down period it means the cache is overflowing and CPU is
> busy purging contents until enough space is available for the new traffic.

squid CPU usage is very low during the slowdown at .5% - 2.5% with
most of the CPU idle.  I get 18M cache size every time I check:

# du -sh /var/cache/squid
18M	/var/cache/squid

I've tried each of these with no noticeable change:

cache_dir ufs /var/cache/squid 100 16 256
cache_dir aufs /var/cache/squid 100 16 256
cache_dir diskd /var/cache/squid 100 16 256

- Grant