Search squid archive

Re: 6.x gives frequent connection to peer failed - spurious?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 2023-11-21 08:38, Stephen Borrill wrote:
On 15/11/2023 21:55, Alex Rousskov wrote:
On 2023-11-10 05:46, Stephen Borrill wrote:

With 6.x (currently 6.5) there are very frequent (every 10 seconds or so) messages like:
2023/11/10 10:25:43 kid1| ERROR: Connection to 127.0.0.1:8123 failed


why is this logged as a connection failure

The current error wording is too assuming and, in your case, evidently misleading. The phrase "Connection to X failed" should be changed to something more general like "Cannot contact cache_peer X" or "Cannot communicate with cache_peer X".

CachePeer::countFailure() patches welcome.

But the point is that it _can_ communicate with the peer, but the peer itself can't service the request. The peer returning 503 shouldn't be logged as a connection failure


My bad. I missed the fact that the described DNS error happens at a _peer_ Squid. Sorry.


Currently, Squid v6 treats most CONNECT-to-peer errors as a sign of a broken peer. In 2022, 4xx errors were excluded from that set[1]. At that time, we also proposed to make that decision configurable using a new cache_peer_fault directive[2], but the new directive was blocked as an "overkill"[3], so we hard-coded 4xx exclusion instead.

Going forward, you have several options, including these two:

1. Convince others that Squid should treat all 503 CONNECT errors from peers as it already treats all 4xx errors. Hard-code that new logic.

2. Convince others that cache_peer_fault or a similar directive is a good idea rather than an overkill. Resurrect its implementation[2].


[1] https://github.com/squid-cache/squid/commit/022dbabd89249f839d1861aa87c1ab9e1a008a47

[2] https://github.com/squid-cache/squid/commit/25431f18f2f5e796b8704c85fc51f93b6cc2a73d

[3] https://github.com/squid-cache/squid/pull/1166#issuecomment-1295806530


HTH,

Alex.


 > do I need to worry about it beyond it filing up the logs needlessly?

In short, "yes".

I cannot accurately assess your specific needs, but, in most environments, one should indeed worry that their cache_peer server names cannot be reliably resolved because failed resolution attempts waste Squid resources and increase transaction response time. Moreover, if these failures are frequent enough (relative to peer usage attempts), the affected cache_peer will be marked as DEAD (as you have mentioned):

 > 2023/11/09 08:55:22 kid1| Detected DEAD Parent: 127.0.0.1:8123

Problem seems to be easily reproducible:

1# env https_proxy=http://127.0.0.1:8084 curl https://www.invalid.domain/
curl: (56) CONNECT tunnel failed, response 503
2# grep invalid /usr/local/squid/logs/access.log|tail -1
1700573429.015      4 127.0.0.1:8084 TCP_TUNNEL/503 0 CONNECT www.invalid.domain:443 - FIRSTUP_PARENT/127.0.0.1:8123 -
3# date -r 1700573429 '+%Y/%m/%d %H:%M:%S'
2023/11/21 13:30:29
4# grep '2023/11/21 13:30:29' /usr/local/squid/logs/cache.log
2023/11/21 13:30:29 kid1| ERROR: Connection to 127.0.0.1:8123 failed

With 4.x there were no such messages.

By comparing to the peer squid logs, these seems to tally with DNS failures: peer_select.cc(479) resolveSelected: PeerSelector1688 found all 0 destinations for bugzilla.tucasi.com:443

Full ALL,2 log at the time of the reported connection failure:

2023/11/10 10:25:43.162 kid1| 5,2| TcpAcceptor.cc(214) doAccept: New connection on FD 17 2023/11/10 10:25:43.162 kid1| 5,2| TcpAcceptor.cc(316) acceptNext: connection on conn3 local=127.0.0.1:8123 remote=[::] FD 17 flags=9 2023/11/10 10:25:43.162 kid1| 11,2| client_side.cc(1332) parseHttpRequest: HTTP Client conn13206 local=127.0.0.1:8123 remote=127.0.0.1:57843 FD 147 flags=1 2023/11/10 10:25:43.162 kid1| 11,2| client_side.cc(1336) parseHttpRequest: HTTP Client REQUEST: 2023/11/10 10:25:43.162 kid1| 85,2| client_side_request.cc(707) clientAccessCheckDone: The request CONNECT bugzilla.tucasi.com:443 is ALLOWED; last ACL checked: localhost 2023/11/10 10:25:43.162 kid1| 85,2| client_side_request.cc(683) clientAccessCheck2: No adapted_http_access configuration. default: ALLOW 2023/11/10 10:25:43.162 kid1| 85,2| client_side_request.cc(707) clientAccessCheckDone: The request CONNECT bugzilla.tucasi.com:443 is ALLOWED; last ACL checked: localhost 2023/11/10 10:25:43.162 kid1| 44,2| peer_select.cc(460) resolveSelected: Find IP destination for: bugzilla.tucasi.com:443' via bugzilla.tucasi.com 2023/11/10 10:25:43.163 kid1| 44,2| peer_select.cc(479) resolveSelected: PeerSelector1526 found all 0 destinations for bugzilla.tucasi.com:443 2023/11/10 10:25:43.163 kid1| 44,2| peer_select.cc(480) resolveSelected:    always_direct = ALLOWED 2023/11/10 10:25:43.163 kid1| 44,2| peer_select.cc(481) resolveSelected:     never_direct = DENIED 2023/11/10 10:25:43.163 kid1| 44,2| peer_select.cc(482) resolveSelected:         timedout = 0 2023/11/10 10:25:43.163 kid1| 4,2| errorpage.cc(1397) buildBody: No existing error page language negotiated for ERR_DNS_FAIL. Using default error file. 2023/11/10 10:25:43.163 kid1| 33,2| client_side.cc(617) swanSong: conn13206 local=127.0.0.1:8123 remote=127.0.0.1:57843 flags=1

If my analysis is correct why is this logged as a connection failure and do I need to worry about it beyond it filing up the logs needlessly?

My concern is that this could lead to the parent being incorrectly declared DEAD thus impacting other traffic:

2023/11/09 08:55:22 kid1| Detected DEAD Parent: 127.0.0.1:8123
     current master transaction: master4581234



_______________________________________________
squid-users mailing list
squid-users@xxxxxxxxxxxxxxxxxxxxx
https://lists.squid-cache.org/listinfo/squid-users




[Index of Archives]     [Linux Audio Users]     [Samba]     [Big List of Linux Books]     [Linux USB]     [Yosemite News]

  Powered by Linux