On 09/01/2024 09:51, Stephen Borrill wrote:
On 09/01/2024 03:41, Alex Rousskov wrote:
On 2024-01-08 08:31, Stephen Borrill wrote:
I'm trying to determine why squid 6.x (seen with 6.5) connected via
IPv4-only periodically fails to connect to the destination and then
requires a restart to fix it (reload is not sufficient).
The problem appears to be that a host that has one address each of
IPv4 and IPv6 occasionally has its IPv4 address go missing as a
destination. On closer inspection, this appears to happen when the
IPv6 address (not the IPv4) address is marked as bad. A log fragment
is as follows:
2024/01/08 13:18:39.974 kid1| 44,2| peer_select.cc(460)
resolveSelected: Find IP destination for:
clientservices.googleapis.com:443' via clientservices.googleapis.com
2024/01/08 13:18:39.974 kid1| 44,2| peer_select.cc(1174) handlePath:
PeerSelector82284 found conn696198 local=0.0.0.0
remote=142.250.187.227:443 HIER_DIRECT flags=1, destination #1 for
clientservices.googleapis.com:443
2024/01/08 13:18:39.974 kid1| 44,2| peer_select.cc(1174) handlePath:
PeerSelector82284 found conn696199 local=[::]
remote=[2a00:1450:4009:820::2003]:443 HIER_DIRECT flags=1,
destination #2 for clientservices.googleapis.com:443
2024/01/08 13:18:39.974 kid1| 44,2| peer_select.cc(479)
resolveSelected: PeerSelector82284 found all 2 destinations for
clientservices.googleapis.com:443
2024/01/08 13:18:40.245 kid1| 14,2| ipcache.cc(1031) markAsBad:
[2a00:1450:4009:820::2003]:443 of clientservices.googleapis.com
2024/01/08 13:18:40.245 kid1| 14,3| ipcache.cc(946) seekNewGood:
succeeded for clientservices.googleapis.com:
[2a00:1450:4009:820::2003] #2/2-1
2024/01/08 13:18:40.245 kid1| 14,3| ipcache.cc(978) restoreGoodness:
cleared all IPs for clientservices.googleapis.com; now back to
[2a00:1450:4009:820::2003] #2/2-1
2024/01/08 13:18:42.065 kid1| 14,3| Address.cc(389) lookupHostIP:
Given Non-IP 'clientservices.googleapis.com': hostname or servname
not provided or not known
2024/01/08 13:18:42.065 kid1| 44,2| peer_select.cc(460)
resolveSelected: Find IP destination for:
clientservices.googleapis.com:443' via clientservices.googleapis.com
2024/01/08 13:18:42.065 kid1| 14,3| Address.cc(389) lookupHostIP:
Given Non-IP 'clientservices.googleapis.com': hostname or servname
not provided or not known
2024/01/08 13:18:42.065 kid1| 44,2| peer_select.cc(1174) handlePath:
PeerSelector82372 found conn697148 local=[::]
remote=[2a00:1450:4009:820::2003]:443 HIER_DIRECT flags=1,
destination #1 for clientservices.googleapis.com:443
2024/01/08 13:18:42.065 kid1| 44,2| peer_select.cc(479)
resolveSelected: PeerSelector82372 found all 1 destinations for
clientservices.googleapis.com:443
This shows two subsequent connection attempts to
clientservices.googleapis.com. The first one has both IPv4 and IPv6
destinations. The IPv6 address is passed to markAsBad.
Yes.
After that the IPv4 address is not listed as a destination.
I do not see that. I see IPv6 address being selected as the first
destination (instead of the IPv4 address).
My reading of the the log files is that all possible destinations are
listed, e.g. "found all 18 destinations for www.googleapis.com:443".
Prior to markAsBad being called both IP addresses for
clientservices.googleapis.com are shown, afterwards the IPv4 is not
listed. It isn't just that IPv6 is being selected as the first
destination, it is the ONLY destination as the IPv4 address is no longer
a candidate.
I cannot explain why that happens though. Moreover, a combination of
certain lines in your debug output near "seekNewGood" do not make
sense to me -- I do not see how it is possible for Squid to display
those exact debugging details, but I am probably missing something.
Can you retest and repost similar lines with 14,9 (or at least 14,7)
added to your debug_options (or share those lines privately; the more
lines you can share, the better)?
Thanks, I have started a new run with 14,7 and will pass it on privately
unless I can distil out the relevant lines to be small enough for the list.
A distilled version is here. Let me know if you want the full log.
This looks suspicious to me:
have: [2001:4860:4802:32::78]:443 at 0 in 216.239.38.120 #1/2-0
2024/01/09 09:34:34.102 kid1| 14,4| ipcache.cc(617) nbgethostbyname:
forcesafesearch.google.com
2024/01/09 09:34:34.102 kid1| 14,4| ipcache.cc(657)
ipcache_nbgethostbyname_: ipcache_nbgethostbyname: HIT for
'forcesafesearch.google.com'
2024/01/09 09:34:34.102 kid1| 14,7| ipcache.cc(253) forwardIp:
216.239.38.120
2024/01/09 09:34:34.102 kid1| 44,2| peer_select.cc(1174) handlePath:
PeerSelector128957 found conn1010793 local=0.0.0.0
remote=216.239.38.120:443 HIER_DIRECT flags=1, destination #1 for
forcesafesearch.google.com:443
2024/01/09 09:34:34.102 kid1| 14,7| ipcache.cc(253) forwardIp:
[2001:4860:4802:32::78]
2024/01/09 09:34:34.102 kid1| 44,2| peer_select.cc(1174) handlePath:
PeerSelector128957 found conn1010794 local=[::]
remote=[2001:4860:4802:32::78]:443 HIER_DIRECT flags=1, destination #2
for forcesafesearch.google.com:443
2024/01/09 09:34:34.102 kid1| 14,7| ipcache.cc(236) finalCallback:
0x18f816d38
2024/01/09 09:34:34.102 kid1| 44,2| peer_select.cc(479) resolveSelected:
PeerSelector128957 found all 2 destinations for
forcesafesearch.google.com:443
2024/01/09 09:34:34.352 kid1| 14,7| ipcache.cc(990) have:
[2001:4860:4802:32::78]:443 at 0 in 216.239.38.120 #1/2-0
2024/01/09 09:34:34.352 kid1| 14,2| ipcache.cc(1031) markAsBad:
[2001:4860:4802:32::78]:443 of forcesafesearch.google.com
2024/01/09 09:34:34.352 kid1| 14,3| ipcache.cc(946) seekNewGood:
succeeded for forcesafesearch.google.com: [2001:4860:4802:32::78] #2/2-1
2024/01/09 09:34:34.352 kid1| 14,3| ipcache.cc(978) restoreGoodness:
cleared all IPs for forcesafesearch.google.com; now back to
[2001:4860:4802:32::78] #2/2-1
2024/01/09 09:34:35.683 kid1| 14,4| ipcache.cc(617) nbgethostbyname:
forcesafesearch.google.com
2024/01/09 09:34:35.683 kid1| 14,3| Address.cc(389) lookupHostIP: Given
Non-IP 'forcesafesearch.google.com': hostname or servname not provided
or not known
2024/01/09 09:34:35.683 kid1| 14,4| ipcache.cc(657)
ipcache_nbgethostbyname_: ipcache_nbgethostbyname: HIT for
'forcesafesearch.google.com'
2024/01/09 09:34:35.683 kid1| 14,7| ipcache.cc(253) forwardIp:
[2001:4860:4802:32::78]
2024/01/09 09:34:35.683 kid1| 44,2| peer_select.cc(1174) handlePath:
PeerSelector128974 found conn1010956 local=[::]
remote=[2001:4860:4802:32::78]:443 HIER_DIRECT flags=1, destination #1
for forcesafesearch.google.com:443
2024/01/09 09:34:35.683 kid1| 14,7| ipcache.cc(236) finalCallback:
0x18f816d38
2024/01/09 09:34:35.683 kid1| 44,2| peer_select.cc(479) resolveSelected:
PeerSelector128974 found all 1 destinations for
forcesafesearch.google.com:443
2024/01/09 09:34:35.683 kid1| 14,7| ipcache.cc(990) have:
[2001:4860:4802:32::78]:443 at 0 in [2001:4860:4802:32::78] #2/2-1
Note that there have been many connections to
clientservices.googleapis.com prior to this where markAsBad was not
called, even though IPv6 connectivity was never available.
No markAsBad() is probably normal if Squid did not try to establish an
IPv6 connection or did not wait long enough to know the result of that
attempt. However, that does not explain why Squid selected an IPv6
address as the next "good" address right after marking that IPv6
address as bad (at "restoreGoodness" line) when there was another good
IP address available. It is as if Squid stored two identical IPv6
addresses (and not IPv4 ones), but that should not happen either.
This is tangentially related to this thread too:
https://lists.squid-cache.org/pipermail/squid-users/2023-November/026266.html
Once only the IPv6 address is being used, then it returns 503 for that
host and thus can quickly get marked as dead by a downstream squid
meaning it does not get used at all (and if it's the only peer all
access stops).
_______________________________________________
squid-users mailing list
squid-users@xxxxxxxxxxxxxxxxxxxxx
https://lists.squid-cache.org/listinfo/squid-users