Re: timeout option needed for ipv6 even in squid-3.4.6?

Amos Jeffries <squid3@xxxxxxxxxxxxx> · Mon, 28 Jul 2014 16:58:34 +1200

On 28/07/2014 10:35 a.m., Jason Haar wrote:
> Hi there
> 
> I'm seeing a reliability issue with squid-3.1.10 through 3.4.6 accessing
> ipv6 sites.
> 
> The root cause is that the ipv6 "Internet" is still a lot less reliable
> than the ipv4 "Internet". Lots of sites seem to have a "flappy"
> relationship with ipv6 which is not reflected in their ipv4 realm. This
> of course has nothing to do with squid directly - but impacts it
> 
> So the issue I'm seeing is going to some websites that have both ipv6
> and ipv4 addresses, ipv6 "working" (ie no immediate "no route" type
> errors), but when squid tries to connect to the ipv6 address first, it
> hangs so long on "down" sites that it times out and never gets around to
> trying the working ipv4 address. It also doesn't appear to remember the
> issue, so that it continues to be down (ie the ipv6 address that is down
> for a website isn't cached to stop squid going there again [for a
> timeframe])
> 
> Shouldn't squid just treat all ipv6 and ipv4 addresses assigned to a DNS
> name in a "round robin" fashion, keeping track of which ones are
> working? (I think it already does that with ipv4, I guess it isn't with
> ipv6?). As per Subject line, I suspect squid needs a ipv6 timeout that
> is shorter than the overall timeout, so that it will fallback on ipv4?

No. round-robin IP connections from a proxy cause more problems than
they solve. HTTP multiplexing / persistent connections, DNS behaviours,
browser "happy eyeballs" algorithm are all involved or affected by the
IP selection. A lot of applications use stateful sessions in the
assumption that browser once found an IP will stick with it, so the best
thing for Squid to do is the same.

An IP is just an IP, regardless of version. Connectivity issues happen
just as often in IPv4 as in IPv6 (more so when "carrier grade" NAT gets
involved). The only special treatment IPv6 gets is sorting first by
default ("dns_v4_first on" can change that) since 79% of networks today
apparently have IPv6 connectivity operating faster by at least a 1ms
than IPv4. It also avoids a bunch of potential issues with NAT and other
IPv4-only middleware.

Squid already does cache IP connectivity results. The problems are
firstly, whenever DNS supplies new or updated IP information the connect
tests have to be retried. Connection issues are quite commone even in
IPv4 and usually temporary. Secondly that Squid timeouts (below) are not
by default set to the right values to make the sites you noticed work
very well.

There are several limits which you can set in Squid to speed up or slow
down the whole process:

 dns_timeout - for how long Squid will wait for DNS results. The default
here is 30 seconds. If your DNS servers are highly reliable you can set
that lower.
 ** If the problems sites are taking a long time to respond to AAAA
queries this will greatly affect eth connection time. Setting this down
closer to 10 sec can help for specific sites with fully broken DNS
servers, but harms others which merely have slow DNS servers. YMMV, but
I recommen checking the AAAA lookup speed for your specific problem
sites before changing this.

 connect_timeout - for how long Squid waits for TCP SYN/SYN-ACK
handshake to occur. The default here is a full minute. What you set this
to depends on the Squid series:
 * In 3.1 and older this covered DNS lookup and a TCP handshakes for
each IP address found by DNS. In these versions you increase the timeout
to get better IPv6 failover behaviour.
 * In 3.2 and later this covers only one TCP handshake. In these
versions you *decrease* it to improve performance. You can safely set it
to a few seconds, but be aware of your Squid machines networking stack
behaviour regarding TCP protocol retries and timeouts to determine what
values will help or hurt [1]

 forward_max_retries - how many times Squid will attempt a full connect
cycle (one connect_timeout). Default in stable releases is 10, squid-3.5
release is bumping this up to 25. What you set this to depends on the
Squid series again, but as a side effect of connect_timeout changes. In
all versions you can get better connectivity by increasing the value.
For several of teh top-ten websites 25 is practically required just to
get past the many IPv6 addresses they advertise and attempt any IPv4.

 forward_timeout - for how long in total Squid will attempt to connect
to the servers (via all methods). The default here is 4 minutes. You can
set it longer to allow automated systems better connectivity chances,
but most people do not have that type of patience so 4 min before
getting the "cannot connect" error page is probably a bit long already.
You should not have to change this.

> 
> i.e. right now I can't get to http://cs.co/ as their ipv6 address is
> down, but their ipv4 address is up and working - but squid won't try it
> because it hangs so long trying the ipv6 address (and on the flip-side,
> www.google.com is working fine over ipv6). To put it another way,
> squid-3.1.10 and newer work fine if the ipv6 address allocated to a site
> is up and responding, but cause issues if it is not
> 

cs.co seems to have fast DNS

In general I recommend for you on a current (Squid-3.2 or later) releases:
  connect_timeout 5 seconds
  forward_max_retries 25

[1] Geof Huston has a useful column on how TCP retries affects "Happy
eyeballs" software and IPv6 failover at
<http://www.potaroo.net/ispcol/2012-05/notquite.html>

Amos