Well this was a wild ride, I actually tracked the problem back to.... dns64/nat64!!!!!!!!!
What I discovered is that the affected webserver didn't actually have ipv6 - it only had 2 ipv4 addresses. But something in my DNS-tree (I'm suspecting the local systemd-resolve, but can't actually find any direct evidence) had whacked fake DNS64/NAT64 records for each of them. I've never seen them before so didn't realise "64:ff9b:XXXXXXXX" was a "special" IPv6 range. I directly queried our upstream DNS recursive name server and it didn't have those IPv6 records - but the local systemd-resolve would not give them up. So I down/up-ed the interface (resetting systemd-resolve) and the problem disappeared.
This new information really doesn't change the nature of the question, but I'm afraid the problem is now resolved (for the moment) so debugging won't catch it. If it happens again (I have never seen this before) I'll be sure to do the debugging thang.
On Tue, Feb 22, 2022 at 3:16 AM Alex Rousskov <rousskov@xxxxxxxxxxxxxxxxxxxxxxx> wrote:
On 2/20/22 20:43, Jason Haar wrote:
> I've noticed that the Internet ipv6 is not quite as reliable as ipv4, in
> that squid reports it cannot connect to web servers with an ipv6 error
> when the web server is still available over ipv4.
>
> eg right now one of our Internet-based web apps (which has 2 ipv6 and 2
> ipv4 IP addresses mapped to it's DNS name) is not responding over ipv6
> for some reason (I dunno - not involved myself) - but is working fine
> over ipv4. Squid-5.4 is erroring out - saying that it cannot connect to
> the first ipv6 address with a "no route to host" error. But if I use
> good-ol' telnet to the DNS name, telnet shows it trying-and-failing
> against both ipv6 addresses and then succeeds against the ipv4. ie it
> works and squid doesn't. BTW the same squid server is currently fine
> with ipv6 clients talking to it and it talking over ipv6 to Internet
> hosts like google.com <http://google.com> - ie this is an ipv6 outage on
> one Internet host where it's ipv4 is still working.
>
> This doesn't seem like a negative_dns_ttl setting issue, it seems like
> squid just tries one address on a multiple-IP DNS record and stops
> trying? I even got tcpdump up and can see that when I do a
> "shift-reload" on the webpage, squid only sends a few SYN packets to the
> same non-working IPv6 address - it doesn't even try the other 3 IPs?
>
> I also checked squidcachemgr.cgi and the DNS record isn't even cached in
> "FQDN Cache Stats and Contents", which I guess is consistent with it's
> opinion that it's not working.
>
> Any ideas what's going on there? thanks!
Squid is supposed to send both A and AAAA DNS queries for the uncached
domain and then try the first IP it can DNS-resolve and TCP-connect to.
If that winning destination does not work at HTTP level, then Squid may,
in some cases, try other destinations. There are lots of variables and
nuances related to the associated Happy Eyeballs and reforwarding
algorithms. It is impossible to say for sure what is going on in your
specific case without more information.
Your best bet may be to share an ALL,9 cache.log that reproduces the
problem using a single isolated test transaction:
https://wiki.squid-cache.org/SquidFaq/BugReporting#Debugging_a_single_transaction
HTH,
Alex.
Cheers
Jason Haar
Information Security Manager, Trimble Navigation Ltd.
Phone: +1 408 481 8171
PGP Fingerprint: 7A2E 0407 C9A6 CAF6 2B9F 8422 C063 5EBB FE1D 66D1
_______________________________________________ squid-users mailing list squid-users@xxxxxxxxxxxxxxxxxxxxx http://lists.squid-cache.org/listinfo/squid-users