Re: Occasional slow connections/timeouts

Amos Jeffries <squid3@xxxxxxxxxxxxx> · Tue, 25 Feb 2014 12:49:06 +1300

On 2014-02-25 02:12, Simon Beale wrote:
On 2014-02-21 06:10, Simon Beale wrote:
I've got a problem at the moment with our general squid proxies where
occasionally requests take a long time that shouldn't do. (i.e. 5+
seconds
or timeout, instead of milliseconds).

This is most common on our proxies doing 100 reqs/sec, but happens
overnight too when they're running at 10 reqs/sec. I've got this
happening
with both v3.4.2 and also with a box I've downgraded back to v3.1.10.
For
v3.4.2, it's happening in both multiple worker and single worker 
modes.

As a follow up, we've narrowed this down to the internal DNS resolver.
When I deploy a 3.4.2 (which is what we're running elsewhere) that's 
been
recompiled with "--disable-internal-dns", the problem completely goes
away.

What sort of CPU loading do you have at ~100req/sec?
  is that at or near your local installations req/sec capacity?

For the box running with a single worker, it consumes 50% of one core 
at
100 req/sec.
For the boxes running with 9 workers, each worker consumes 5% of a core 
at
the same rate.

The test is not reproducible, sadly, but I've got a cronjob running 
on
localhost on these boxes testing access times to various URLs 
covering:
HTTPS, non-HTTPS static content, using IP not hostname over both HTTP
and
HTTPS, and a URL on the same vlan as the proxies. All of these test
cases
have it happen occasionally, but not repeatedly/reliably.

Some ideas:
  * DNS lookup delays ?

Yeah, when I enabled the dns resolution time logging in squid, that 
became
apparent.

Quite why the internal dns resolver shows this, but the external one
doesn't, I don't know. The DNS server query logs show both DNS servers 
in
/etc/resolv.conf getting the request in turn and answering it (though 5
seconds apart). It's happening for us in multiple datacentres, so is
unlikely to be port errors or internal packet loss.

The dnsserver helper used when internal DNS is disabled uses 
gethostname()/getaddrinfo() and thus the local machine resolver. It has 
a limit of ~250 req/sec on most systems and most times does not support 
IPv6 DNS resolvers configured through squid.conf (does support them if 
configured through resolv.conf).

The internal DNS client is using a form of happy-eyeballs scheduling to 
send A and AAAA packets but waiting for *both* responses before 
continuing (unlike full-blown happy eyeballs which goes with the first 
response regardless of missing IPs). It should only be contacting one of 
the resolvers at a time.

From your above description it sounds like the first resolver configured 
is occasionally "failing" after 5 sec and Squid is moving on to the 
second, which works.
 Do you have "dns_timeout 5 seconds" configured ?

With internal DNS enabled your cachemgr "idns" report has a lot of 
detail on the particular errors and actions happening.

NP: with Squid-3.4 the DNS lookup timeout has been detached from the TCP 
connect_timeout and only happens once per connection destination. So you 
can set each as short as you wish without affecting the other connection 
setup steps.

Amos

It's only(/mostly?) apparent on our squid servers that do desktop
proxying, so do lots of DNS requests to everywhere; the squid servers 
that
handle just our datacentre servers don't show this problem, but only
really go to about 40 hosts in total.

Thanks

Simon