On 2023-11-21 08:38, Stephen Borrill wrote:
On 15/11/2023 21:55, Alex Rousskov wrote:
On 2023-11-10 05:46, Stephen Borrill wrote:
With 6.x (currently 6.5) there are very frequent (every 10 seconds or
so) messages like:
2023/11/10 10:25:43 kid1| ERROR: Connection to 127.0.0.1:8123 failed
why is this logged as a connection failure
The current error wording is too assuming and, in your case, evidently
misleading. The phrase "Connection to X failed" should be changed to
something more general like "Cannot contact cache_peer X" or "Cannot
communicate with cache_peer X".
CachePeer::countFailure() patches welcome.
But the point is that it _can_ communicate with the peer, but the peer
itself can't service the request. The peer returning 503 shouldn't be
logged as a connection failure
My bad. I missed the fact that the described DNS error happens at a
_peer_ Squid. Sorry.
Currently, Squid v6 treats most CONNECT-to-peer errors as a sign of a
broken peer. In 2022, 4xx errors were excluded from that set[1]. At that
time, we also proposed to make that decision configurable using a new
cache_peer_fault directive[2], but the new directive was blocked as an
"overkill"[3], so we hard-coded 4xx exclusion instead.
Going forward, you have several options, including these two:
1. Convince others that Squid should treat all 503 CONNECT errors from
peers as it already treats all 4xx errors. Hard-code that new logic.
2. Convince others that cache_peer_fault or a similar directive is a
good idea rather than an overkill. Resurrect its implementation[2].
[1]
https://github.com/squid-cache/squid/commit/022dbabd89249f839d1861aa87c1ab9e1a008a47
[2]
https://github.com/squid-cache/squid/commit/25431f18f2f5e796b8704c85fc51f93b6cc2a73d
[3] https://github.com/squid-cache/squid/pull/1166#issuecomment-1295806530
HTH,
Alex.
> do I need to worry about it beyond it filing up the logs needlessly?
In short, "yes".
I cannot accurately assess your specific needs, but, in most
environments, one should indeed worry that their cache_peer server
names cannot be reliably resolved because failed resolution attempts
waste Squid resources and increase transaction response time.
Moreover, if these failures are frequent enough (relative to peer
usage attempts), the affected cache_peer will be marked as DEAD (as
you have mentioned):
> 2023/11/09 08:55:22 kid1| Detected DEAD Parent: 127.0.0.1:8123
Problem seems to be easily reproducible:
1# env https_proxy=http://127.0.0.1:8084 curl https://www.invalid.domain/
curl: (56) CONNECT tunnel failed, response 503
2# grep invalid /usr/local/squid/logs/access.log|tail -1
1700573429.015 4 127.0.0.1:8084 TCP_TUNNEL/503 0 CONNECT
www.invalid.domain:443 - FIRSTUP_PARENT/127.0.0.1:8123 -
3# date -r 1700573429 '+%Y/%m/%d %H:%M:%S'
2023/11/21 13:30:29
4# grep '2023/11/21 13:30:29' /usr/local/squid/logs/cache.log
2023/11/21 13:30:29 kid1| ERROR: Connection to 127.0.0.1:8123 failed
With 4.x there were no such messages.
By comparing to the peer squid logs, these seems to tally with DNS
failures:
peer_select.cc(479) resolveSelected: PeerSelector1688 found all 0
destinations for bugzilla.tucasi.com:443
Full ALL,2 log at the time of the reported connection failure:
2023/11/10 10:25:43.162 kid1| 5,2| TcpAcceptor.cc(214) doAccept: New
connection on FD 17
2023/11/10 10:25:43.162 kid1| 5,2| TcpAcceptor.cc(316) acceptNext:
connection on conn3 local=127.0.0.1:8123 remote=[::] FD 17 flags=9
2023/11/10 10:25:43.162 kid1| 11,2| client_side.cc(1332)
parseHttpRequest: HTTP Client conn13206 local=127.0.0.1:8123
remote=127.0.0.1:57843 FD 147 flags=1
2023/11/10 10:25:43.162 kid1| 11,2| client_side.cc(1336)
parseHttpRequest: HTTP Client REQUEST:
2023/11/10 10:25:43.162 kid1| 85,2| client_side_request.cc(707)
clientAccessCheckDone: The request CONNECT bugzilla.tucasi.com:443 is
ALLOWED; last ACL checked: localhost
2023/11/10 10:25:43.162 kid1| 85,2| client_side_request.cc(683)
clientAccessCheck2: No adapted_http_access configuration. default: ALLOW
2023/11/10 10:25:43.162 kid1| 85,2| client_side_request.cc(707)
clientAccessCheckDone: The request CONNECT bugzilla.tucasi.com:443 is
ALLOWED; last ACL checked: localhost
2023/11/10 10:25:43.162 kid1| 44,2| peer_select.cc(460)
resolveSelected: Find IP destination for: bugzilla.tucasi.com:443'
via bugzilla.tucasi.com
2023/11/10 10:25:43.163 kid1| 44,2| peer_select.cc(479)
resolveSelected: PeerSelector1526 found all 0 destinations for
bugzilla.tucasi.com:443
2023/11/10 10:25:43.163 kid1| 44,2| peer_select.cc(480)
resolveSelected: always_direct = ALLOWED
2023/11/10 10:25:43.163 kid1| 44,2| peer_select.cc(481)
resolveSelected: never_direct = DENIED
2023/11/10 10:25:43.163 kid1| 44,2| peer_select.cc(482)
resolveSelected: timedout = 0
2023/11/10 10:25:43.163 kid1| 4,2| errorpage.cc(1397) buildBody: No
existing error page language negotiated for ERR_DNS_FAIL. Using
default error file.
2023/11/10 10:25:43.163 kid1| 33,2| client_side.cc(617) swanSong:
conn13206 local=127.0.0.1:8123 remote=127.0.0.1:57843 flags=1
If my analysis is correct why is this logged as a connection failure
and do I need to worry about it beyond it filing up the logs needlessly?
My concern is that this could lead to the parent being incorrectly
declared DEAD thus impacting other traffic:
2023/11/09 08:55:22 kid1| Detected DEAD Parent: 127.0.0.1:8123
current master transaction: master4581234
_______________________________________________
squid-users mailing list
squid-users@xxxxxxxxxxxxxxxxxxxxx
https://lists.squid-cache.org/listinfo/squid-users