Re: Occasional slow connections/timeouts

Amos Jeffries <squid3@xxxxxxxxxxxxx> · Fri, 21 Feb 2014 10:51:05 +1300

On 2014-02-21 06:10, Simon Beale wrote:
I've got a problem at the moment with our general squid proxies where
occasionally requests take a long time that shouldn't do. (i.e. 5+ 
seconds
or timeout, instead of milliseconds).

This is most common on our proxies doing 100 reqs/sec, but happens
overnight too when they're running at 10 reqs/sec. I've got this 
happening
with both v3.4.2 and also with a box I've downgraded back to v3.1.10. 
For
v3.4.2, it's happening in both multiple worker and single worker modes.

What sort of CPU loading do you have at ~100req/sec?
 is that at or near your local installations req/sec capacity?

NP:
 * slow-down at peak capacity is normal as the proxy is busy servicing 
other traffic.

 * slow-down at only a few req/sec is normal as Squid spends a lot of 
its time in artificial I/O wait delays to prevent reading/writing 
individual bytes off the network. Nothing worse for the network than to 
have ~71 bytes of packet overhead for every 2 bytes of data transferred.

 * slow-down randomly all the time could be network congestion, Window 
scaling, ECN or MTU related. even ICMp related (ICMP is *not* optional - 
though many admin block it).

 * then there is bugs.
 - 3.1 had a few IPv6 bugs (some major) which caused TCP retry delays in 
certain circumstances. Since you are seeing it only randomly I would 
suspect remote network(s) somewhere with those issues being a transit 
hop occasionally. Though this is unlikely given 3.4 still shows it.

 - There is a fix in the 3.4.3 release regarding connection IP failover 
that may help if that is part of the issue (or it may not).

The test is not reproducible, sadly, but I've got a cronjob running on
localhost on these boxes testing access times to various URLs covering:
HTTPS, non-HTTPS static content, using IP not hostname over both HTTP 
and
HTTPS, and a URL on the same vlan as the proxies. All of these test 
cases
have it happen occasionally, but not repeatedly/reliably.

Some ideas:
 * DNS lookup delays ?
 * Random TCP connection setup delays?

Different boxes are either running Trend's IWSVA for it's antivirus as 
a
cache_peer, or C-ICAP/clamd as an ICAP service. These both have it 
happen
(as does the case where I disabled the antivirus).

 * object size related? ie scanning time in the AV.

The servers are all running CentOS6.4 on HP Gen8 blades with 48G RAM.

Has anyone seen anything like this, or got any suggestions as to what
might be causing this that I can investigate further?

Simon

Lots of people see it for all sorts of reasons.

Amos