On 2014-02-25 02:12, Simon Beale wrote:
On 2014-02-21 06:10, Simon Beale wrote:
I've got a problem at the moment with our general squid proxies where
occasionally requests take a long time that shouldn't do. (i.e. 5+
seconds
or timeout, instead of milliseconds).
This is most common on our proxies doing 100 reqs/sec, but happens
overnight too when they're running at 10 reqs/sec. I've got this
happening
with both v3.4.2 and also with a box I've downgraded back to v3.1.10.
For
v3.4.2, it's happening in both multiple worker and single worker
modes.
As a follow up, we've narrowed this down to the internal DNS resolver.
When I deploy a 3.4.2 (which is what we're running elsewhere) that's
been
recompiled with "--disable-internal-dns", the problem completely goes
away.
What sort of CPU loading do you have at ~100req/sec?
is that at or near your local installations req/sec capacity?
For the box running with a single worker, it consumes 50% of one core
at
100 req/sec.
For the boxes running with 9 workers, each worker consumes 5% of a core
at
the same rate.
The test is not reproducible, sadly, but I've got a cronjob running
on
localhost on these boxes testing access times to various URLs
covering:
HTTPS, non-HTTPS static content, using IP not hostname over both HTTP
and
HTTPS, and a URL on the same vlan as the proxies. All of these test
cases
have it happen occasionally, but not repeatedly/reliably.
Some ideas:
* DNS lookup delays ?
Yeah, when I enabled the dns resolution time logging in squid, that
became
apparent.
Quite why the internal dns resolver shows this, but the external one
doesn't, I don't know. The DNS server query logs show both DNS servers
in
/etc/resolv.conf getting the request in turn and answering it (though 5
seconds apart). It's happening for us in multiple datacentres, so is
unlikely to be port errors or internal packet loss.
The dnsserver helper used when internal DNS is disabled uses
gethostname()/getaddrinfo() and thus the local machine resolver. It has
a limit of ~250 req/sec on most systems and most times does not support
IPv6 DNS resolvers configured through squid.conf (does support them if
configured through resolv.conf).
The internal DNS client is using a form of happy-eyeballs scheduling to
send A and AAAA packets but waiting for *both* responses before
continuing (unlike full-blown happy eyeballs which goes with the first
response regardless of missing IPs). It should only be contacting one of
the resolvers at a time.
From your above description it sounds like the first resolver configured
is occasionally "failing" after 5 sec and Squid is moving on to the
second, which works.
Do you have "dns_timeout 5 seconds" configured ?
With internal DNS enabled your cachemgr "idns" report has a lot of
detail on the particular errors and actions happening.
NP: with Squid-3.4 the DNS lookup timeout has been detached from the TCP
connect_timeout and only happens once per connection destination. So you
can set each as short as you wish without affecting the other connection
setup steps.
Amos
It's only(/mostly?) apparent on our squid servers that do desktop
proxying, so do lots of DNS requests to everywhere; the squid servers
that
handle just our datacentre servers don't show this problem, but only
really go to about 40 hosts in total.
Thanks
Simon