Re: squid-3.3.5 hangs the entire system

Grant <emailgrant@xxxxxxxxx> · Thu, 11 Jul 2013 23:24:46 -0700

Sorry to top-post.  Any more ideas with this?

- Grant

On Fri, Jul 5, 2013 at 12:33 AM, Grant <emailgrant@xxxxxxxxx> wrote:
>>> I updated to 3.3.6 and the system doesn't become totally unresponsive
>>> any more, although SSH latency is still pretty high when the client is
>>> trying to load a page.  The browser still hangs after loading a few
>>> page elements on some websites (www.google.com/nexus/) but now if I
>>> let the page load for long enough, it does eventually load, but it can
>>> take 5 minutes or longer.  Restarting squid sometimes makes it load a
>>> lot faster.  It's possible that it hangs more often when loading
>>> elements from a different domain or subdomain (services.google.com,
>>> doubleclick.net) but that could be a coincidence.  The client's and
>>> server's internet connections are strong.
>>
>> If that were related it might be DNS or TCP congestion (ECT, Window Scaling,
>> MTU) issues.
>
> I set the following on the squid server and client with no noticeable change:
>
> echo 0 > /proc/sys/net/ipv4/tcp_ecn
> echo 1 > /proc/sys/net/ipv4/ip_no_pmtu_disc
> echo 0 > /proc/sys/net/ipv4/tcp_window_scaling
>
>>>> The usual cause of these type of issues is forwarding loops, although
>>>> your
>>>> low of socket usage indicates that is probably not the problem.
>>>
>>> Yes, I'm the only user.
>>
>> It might be related to the 10ms select-loop delays in Squid. If you load the
>> proxy with a bunch more requests (say 20 in parallel constantly) does it
>> still happen?
>
> I opened 20 tabs in firefox and the 3 tabs which started loading first
> loaded slightly more content than usual.
>
>>> I have this on the squid system while the browser seems to hang so I
>>> think there is plenty of available physical RAM:
>>>
>>> # free
>>>               total       used       free     shared    buffers     cached
>>> Mem:       1985944    1638368     347576          0     838340     219332
>>> -/+ buffers/cache:     580696    1405248
>>> Swap:      1048572          0    1048572
>>>
>>> 2013/07/04 08:51:04.143 kid1| event.cc(250) checkEvents: checkEvents
>>> 2013/07/04 08:51:04.143 kid1| AsyncCall.cc(18) AsyncCall: The
>>> AsyncCall MaintainSwapSpace constructed, this=0xbb3310 [call546]
>>> 2013/07/04 08:51:04.143 kid1| AsyncCall.cc(85) ScheduleCall:
>>> event.cc(259) will call MaintainSwapSpace() [call546]
>>
>> These happening at regular but widely separated intervals? or lots across
>> the slowdown period?
>
> They happen about once per second during the slowdown.
>
>>  This is the main cache garbage collection operations, so should be checked
>> and purge some things every so often. If they happen unuslally frequently
>> during the slow-down period it means the cache is overflowing and CPU is
>> busy purging contents until enough space is available for the new traffic.
>
> squid CPU usage is very low during the slowdown at .5% - 2.5% with
> most of the CPU idle.  I get 18M cache size every time I check:
>
> # du -sh /var/cache/squid
> 18M     /var/cache/squid
>
> I've tried each of these with no noticeable change:
>
> cache_dir ufs /var/cache/squid 100 16 256
> cache_dir aufs /var/cache/squid 100 16 256
> cache_dir diskd /var/cache/squid 100 16 256
>
> - Grant