Re: No failover when default parent proxy fails (Squid 3.5.12)

Amos Jeffries <squid3@xxxxxxxxxxxxx> · Thu, 16 Mar 2017 23:51:48 +1300

On 16/03/2017 10:39 p.m., Jens Offenbach wrote:
> This is the sceanrio;
> 
> Squid 3.5.12 is installed on "squid-proxy.mycompany.com". The two parent proxies are:
> - Primary: proxy.mycompany.de:8080 (139.2.1.3)
> - Fallback: roxy.mycompany.de:8080 (139.2.1.4)
> 
> I have misunderstood the "default" option in "cache_peer". When I got it right, it has the meaning of a fallback, so I switched it to "roxy.mycompany.de". "proxy.mycompany.de" should always be used and "roxy.mycompany.de" only when "proxy.mycompany.de" fails.
> 

Well, kind of. Unless that peer is selected by one of the other
algorithms (for that it has to be 'alive') it will be appended as the
last-resort peer to be used regardless of DEAD/alive status.

> squid.conf:
> 
...
> 
> # OPTIONS WHICH AFFECT THE NEIGHBOR SELECTION ALGORITHM
> # -----------------------------------------------------------------------------
>   cache_peer proxy.materna.de parent 8080 0 no-digest no-query connect-timeout=5 connect-fail-limit=2
>   cache_peer  roxy.materna.de parent 8080 0 no-digest no-query connect-timeout=5 connect-fail-limit=2 default
> 
...
> # OPTIONS INFLUENCING REQUEST FORWARDING 
> # -----------------------------------------------------------------------------
>   always_direct allow to_matnet
>   never_direct  allow all
> 
> # DNS OPTIONS
> # -----------------------------------------------------------------------------
>   dns_nameservers 139.2.34.171
>   dns_nameservers 139.2.34.37
> 
...
> 
> Now, I block traffic on "squid-proxy.mycompany.com" to the primary proxy "proxy.mycompany.de" (139.2.1.3) using IPTables:
> $ iptables -A OUTPUT -p icmp -d 139.2.1.3 -j DROP
> $ iptables -A OUTPUT -p tcp -d 139.2.1.3 -j DROP
> $ iptables -A OUTPUT -p udp -d 139.2.1.3 -j DROP
> 

Are you trying to test connection timeout issues or a host going offline?
These iptables rules will force a timeout but not emulate a host
disconnection. Particularly when ICMP is also dropped.

When a host disconnects Squid will receive active signals (maybe via
ICMP) that the TCP SYN packet cannot get through. That speeds failure
recovery things up enormously. If the peer software simply
crashes/exits, different signals happen but with the same super fast
effects.

REJECT rules would be a better emulation of a machine disconnecting, or
an only-TCP REJECT rule emulates a peer software crash, etc. That way
the ICMP signalling still happens similar to those types of failure.

> On the test machine, I use:
> $ export http_proxy=http://squid-proxy.mycompany.com:3128/
> $ export https_proxy=http://squid-proxy.mycompany.com:3128/
> $ export HTTP_PROXY=http://squid-proxy.mycompany.com:3128/
> $ export HTTPS_PROXY=http://squid-proxy.mycompany.com:3128/
> 
> Trying to download a resource:
> $ wget https://repository.apache.org/content/groups/snapshots/org/apache/karaf/apache-karaf/4.1.1-SNAPSHOT/apache-karaf-4.1.1-20170315.084054-35.tar.gz
> 
> The download hangs for 2 minutes until it gets started. A retry shows the same results, the download starts after 2 minutes showing:
> --2017-03-16 09:31:26--  https://repository.apache.org/content/groups/snapshots/org/apache/karaf/apache-karaf/4.1.1-SNAPSHOT/apache-karaf-4.1.1-20170314.154157-34.tar.gz
> Resolving squid-proxy.mycompany.com (squid-proxy.mycompany.com)... 10.152.132.41
> Connecting to squid-proxy.mycompany.com (squid-proxy.mycompany.com)|10.152.132.41|:3128... connected.
> 
> cache.log:
> 
...
> 2017/03/16 10:17:48 kid1| Starting Squid Cache version 3.5.12 for x86_64-pc-linux-gnu...
> 2017/03/16 10:17:48 kid1| Service Name: squid
> 2017/03/16 10:17:48| pinger: Initialising ICMP pinger ...
> 2017/03/16 10:18:09.579 kid1| 44,2| peer_select.cc(258) peerSelectDnsPaths: Find IP destination for: http://proxy.materna.de:8080/squid-internal-dynamic/netdb' via proxy.materna.de
> 2017/03/16 10:18:09.579 kid1| 44,2| peer_select.cc(280) peerSelectDnsPaths: Found sources for 'http://proxy.materna.de:8080/squid-internal-dynamic/netdb'

These can be avoided by adding no-netdb-exchange option to the
cache_peer config lines. But it is probably a good idea to keep them for
production use as they will be the way of detecting a peer recovery to
live status.

...
> 2017/03/16 10:18:37.951 kid1| 44,2| peer_select.cc(258) peerSelectDnsPaths: Find IP destination for: repository.apache.org:443' via proxy.materna.de
> 2017/03/16 10:18:37.951 kid1| 44,2| peer_select.cc(258) peerSelectDnsPaths: Find IP destination for: repository.apache.org:443' via proxy.materna.de
> 2017/03/16 10:18:37.951 kid1| 44,2| peer_select.cc(258) peerSelectDnsPaths: Find IP destination for: repository.apache.org:443' via roxy.materna.de
> 2017/03/16 10:18:37.951 kid1| 44,2| peer_select.cc(258) peerSelectDnsPaths: Find IP destination for: repository.apache.org:443' via roxy.materna.de
> 2017/03/16 10:18:37.951 kid1| 44,2| peer_select.cc(280) peerSelectDnsPaths: Found sources for 'repository.apache.org:443'
> 2017/03/16 10:18:37.951 kid1| 44,2| peer_select.cc(281) peerSelectDnsPaths:   always_direct = DENIED
> 2017/03/16 10:18:37.951 kid1| 44,2| peer_select.cc(282) peerSelectDnsPaths:    never_direct = ALLOWED
> 2017/03/16 10:18:37.951 kid1| 44,2| peer_select.cc(292) peerSelectDnsPaths:      cache_peer = local=0.0.0.0 remote=139.2.1.3:8080 flags=1
> 2017/03/16 10:18:37.951 kid1| 44,2| peer_select.cc(292) peerSelectDnsPaths:      cache_peer = local=0.0.0.0 remote=139.2.1.3:8080 flags=1
> 2017/03/16 10:18:37.951 kid1| 44,2| peer_select.cc(292) peerSelectDnsPaths:      cache_peer = local=0.0.0.0 remote=139.2.1.4:8080 flags=1
> 2017/03/16 10:18:37.951 kid1| 44,2| peer_select.cc(292) peerSelectDnsPaths:      cache_peer = local=0.0.0.0 remote=139.2.1.4:8080 flags=1
> 2017/03/16 10:18:37.951 kid1| 44,2| peer_select.cc(295) peerSelectDnsPaths:        timedout = 0
> 

Hmm. Something is going wrong with our logic to ensure unique IP:port
entries in the list of selected paths. It should not be affecting your
issue much though.

> access.log
> 
> 1489656077.628 159679 10.30.216.160 TCP_TUNNEL/200 26328966 CONNECT repository.apache.org:443 - ANY_OLD_PARENT/139.2.1.4 -
> 

Uhm. One thing to be very wary of is that transactions are not logged
until they are completed. So things like their full duration and bytes
can be recorded.

When CONNECT are involved some people who are not fully aware of the
meanings of that request method can be surprised by lack of log entries.
It is a tunnel and whole *weeks* worth of various traffic can happen
inside before it reaches that complete state for logging.
 You might see nothing actually happening except CONNECT lines being
logged with zero sizes, or huge amounts of https:// URLs being fetched
without a single access.log line occuring ... or any mix of behaviour in
between.

This connection had 26MB transferred over it. The 'connect' stage (TCP
SYN / SYN-ACK exchange) may have been successful within the first 11
seconds (5sec timeout on first two cache_peer in that cache.log list,
then immediate success on the third) and just nothing visibly happening
on it at the HTTP level for a bit while the TLS crypto did things.

If things are breaking or going slowly at the TLS layer or higher, then
there is nothing you can do in this Squid. As far as this Squid is
concerned the TCP tunnel was setup fine and working. What is inside it
is opaque.

I have just done a test of those two peers from here to see how the
setup goes, and there is an over 2min 10-12sec delay before my ISPs NAT
system cuts the connection. Something is very broken with those
particular peers or the network they reside in. That whole process
should have taken under 350ms and been terminated by their end.

Amos

_______________________________________________
squid-users mailing list
squid-users@xxxxxxxxxxxxxxxxxxxxx
http://lists.squid-cache.org/listinfo/squid-users