I think I found part of the config which triggers the retry.
======
# Check for newproxy request header
acl newproxy_acl req_header x-use-newproxy -i true
# proxy
cache_peer 127.0.0.1 parent 18070 0 no-query no-digest no-netdb-exchange name=proxy
cache_peer_access proxy deny newproxy_acl
# newproxy
cache_peer 127.0.0.1 parent 18079 0 no-query no-digest no-netdb-exchange name=newproxy
cache_peer_access newproxy allow newproxy_acl
cache_peer_access newproxy deny all
never_direct allow all
======
I see retries only when squid config has 2 parents. If I comment out everything related to "newproxy" I can't reproduce this behavior anymore.
BUT
My test request could only be forwarded to "proxy" since it doesn't have "x-use-newproxy" header.
Which results in the following:
squid has 2 parents
request could only be forwarded to the 1st parent due to ACL
parent doesn't respond and closes the TCP connection
squid retries with the same parent ignoring "forward_max_tries 1"
I also want to clarify some facts which make me think that it could be a bug.
1. There're no issues with TCP connection. Squid successfully connects to parent and sends an HTTP request.
2. Parent ACKs HTTP request and then correctly closes the connection with FIN,ACK after 40 seconds. There're no TCP timeouts/reconnects involved. The only issue here is that parent doesn't send any HTTP response.
3. forward_max_tries is set to 1 to make sure squid won't retry. Parent handles the retry so we don't want squid to make any additional retries.
Also see https://wiki.squid-cache.org/SquidFaq/InnerWorkings#When_does_Squid_re-forward_a_client_request.3F
> Squid does not try to re-forward a request if at least one of the following conditions is true:
> …
> The number of forwarding attempts exceeded forward_max_tries. For example, if you set forward_max_tries to 1 (one), then no requests will be re-forwarded.
> …
> Squid has no alternative destinations to try. Please note that alternative destinations may include multiple next hop IP addresses and multiple peers.
> …
Another part of config which may or may not be related (this is to increase the amount of local ports to use):
# 33% of traffic per local IP
acl third random 1/3
acl half random 1/2
tcp_outgoing_address 127.0.0.2 third
tcp_outgoing_address 127.0.0.3 half
tcp_outgoing_address 127.0.0.4
Logs:
ALL,2 (includes 44,2):
2017/11/27 15:53:40.542| 5,2| TcpAcceptor.cc(220) doAccept: New connection on FD 15
2017/11/27 15:53:40.542| 5,2| TcpAcceptor.cc(295) acceptNext: connection on local=0.0.0.0:3128 remote=[::] FD 15 flags=9
2017/11/27 15:53:40.543| 11,2| client_side.cc(2372) parseHttpRequest: HTTP Client local=127.0.0.1:3128 remote=127.0.0.1:53798 FD 45 flags=1
2017/11/27 15:53:40.543| 11,2| client_side.cc(2373) parseHttpRequest: HTTP Client REQUEST:
---------
GET http://HOST:12345/ HTTP/1.1
Host: HOST:12345
User-Agent: curl/7.51.0
Accept: */*
Proxy-Connection: Keep-Alive
----------
2017/11/27 15:53:40.543| 85,2| client_side_request.cc(745) clientAccessCheckDone: The request GET http://HOST:12345/ is ALLOWED; last ACL checked: localhost
2017/11/27 15:53:40.543| 85,2| client_side_request.cc(721) clientAccessCheck2: No adapted_http_access configuration. default: ALLOW
2017/11/27 15:53:40.543| 85,2| client_side_request.cc(745) clientAccessCheckDone: The request GET http://HOST:12345/ is ALLOWED; last ACL checked: localhost
2017/11/27 15:53:40.543| 17,2| FwdState.cc(133) FwdState: Forwarding client request local=127.0.0.1:3128 remote=127.0.0.1:53798 FD 45 flags=1, url="" href="http://HOST:12345/">http://HOST:12345/
2017/11/27 15:53:40.543| 44,2| peer_select.cc(258) peerSelectDnsPaths: Find IP destination for: http://HOST:12345/' via 127.0.0.1
2017/11/27 15:53:40.543| 44,2| peer_select.cc(280) peerSelectDnsPaths: Found sources for 'http://HOST:12345/'
2017/11/27 15:53:40.543| 44,2| peer_select.cc(281) peerSelectDnsPaths: always_direct = DENIED
2017/11/27 15:53:40.543| 44,2| peer_select.cc(282) peerSelectDnsPaths: never_direct = ALLOWED
2017/11/27 15:53:40.543| 44,2| peer_select.cc(292) peerSelectDnsPaths: cache_peer = local=127.0.0.3 remote=127.0.0.1:18070 flags=1
2017/11/27 15:53:40.543| 44,2| peer_select.cc(295) peerSelectDnsPaths: timedout = 0
2017/11/27 15:53:40.543| 11,2| http.cc(2229) sendRequest: HTTP Server local=127.0.0.3:57091 remote=127.0.0.1:18070 FD 40 flags=1
2017/11/27 15:53:40.543| 11,2| http.cc(2230) sendRequest: HTTP Server REQUEST:
---------
GET http://HOST:12345/ HTTP/1.1
User-Agent: curl/7.51.0
Accept: */*
Host: HOST:12345
Cache-Control: max-age=259200
Connection: keep-alive
----------
[SKIPPED 40 seconds until parent closes TCP connection with FIN,ACK]
2017/11/27 15:54:20.627| 11,2| http.cc(1299) continueAfterParsingHeader: WARNING: HTTP: Invalid Response: No object data received for http://HOST:12345/ AKA HOST/
2017/11/27 15:54:20.627| 17,2| FwdState.cc(655) handleUnregisteredServerEnd: self=0x3e31838*2 err=0x409b338 http://HOST:12345/
2017/11/27 15:54:20.627| 11,2| http.cc(2229) sendRequest: HTTP Server local=127.0.0.3:41355 remote=127.0.0.1:18070 FD 40 flags=1
2017/11/27 15:54:20.627| 11,2| http.cc(2230) sendRequest: HTTP Server REQUEST:
---------
GET http://HOST:12345/ HTTP/1.1
User-Agent: curl/7.51.0
Accept: */*
Host: HOST:12345
Cache-Control: max-age=259200
Connection: keep-alive
----------
[SKIPPED 40 seconds again until parent closes TCP connection with FIN,ACK]
2017/11/27 15:55:00.728| ctx: enter level 0: 'http://HOST:12345/'
2017/11/27 15:55:00.728| 11,2| http.cc(719) processReplyHeader: HTTP Server local=127.0.0.3:41355 remote=127.0.0.1:18070 FD 40 flags=1
2017/11/27 15:55:00.728| 11,2| http.cc(720) processReplyHeader: HTTP Server REPLY:
---------
HTTP/1.0 502 Bad Gateway
Cache-Control: no-cache
Connection: close
Content-Type: text/html
<html><body><h1>502 Bad Gateway</h1>
The server returned an invalid or incomplete response.
</body></html>
----------
2017/11/27 15:55:00.728| ctx: exit level 0
2017/11/27 15:55:00.728| 20,2| store.cc(996) checkCachable: StoreEntry::checkCachable: NO: not cachable
2017/11/27 15:55:00.728| 20,2| store.cc(996) checkCachable: StoreEntry::checkCachable: NO: not cachable
2017/11/27 15:55:00.728| 88,2| client_side_reply.cc(2073) processReplyAccessResult: The reply for GET http://HOST:12345/ is ALLOWED, because it matched (access_log stdio:/var/log/squid/access.log line)
2017/11/27 15:55:00.728| 11,2| client_side.cc(1409) sendStartOfMessage: HTTP Client local=127.0.0.1:3128 remote=127.0.0.1:53798 FD 45 flags=1
2017/11/27 15:55:00.728| 11,2| client_side.cc(1410) sendStartOfMessage: HTTP Client REPLY:
---------
HTTP/1.1 502 Bad Gateway
Date: Mon, 27 Nov 2017 23:54:20 GMT
Cache-Control: no-cache
Content-Type: text/html
X-Cache: MISS from ip-172-23-18-130
X-Cache-Lookup: MISS from ip-172-23-18-130:3128
Transfer-Encoding: chunked
Connection: keep-alive
----------
On Thu, Nov 23, 2017 at 11:43 PM, Amos Jeffries <squid3@xxxxxxxxxxxxx> wrote:
On 24/11/17 10:03, Ivan Larionov wrote:
On Nov 23, 2017, at 12:32 AM, Amos Jeffries <squid3@xxxxxxxxxxxxx> wrote:
On 23/11/17 14:20, Ivan Larionov wrote:
Hello....
We have an issue with squid when it tries to re-forward / retry failed request even when forward_max_tries is set to 1. The situation when it happens is when there's no response, parent just closes the connection.
It doesn't happen 100% times. Sometimes squid returns 502 after the 1st try, sometimes it retries once. Also I haven't seen more than 1 retry.
Please enable debug_options 44,2 to see what destinations your Squid is actually finding.
I'll check this on Monday.
max_forward_tries is just a rough cap on the number of server names which can be found when generating that list. The actual destinations count can exceed it if one or more of the servers happens to have multiple IPs to try.
The overall transaction can involve retries if one of the other layers (TCP or HTTP) contains retry semantics to a single server.
Could it be a bug? We'd really like to disable these retries.
Why are trying to break HTTP?
What is the actual problem you are trying to resolve here?
Why do you think I'm trying to break HTTP?
squid forwards the request to parent but parent misbehaves and just closes the connection after 40 seconds. I'm trying to prevent retry of request in such situation. Why squid retries if I never asked him to do it and specifically said "forward_max_tries 1".
And this is not a connection failure, squid successfully establishes the connection and sends the request, parent ACKs it, just never responses back and proactively closes the connection.
This is not misbehaviour on the part of either Squid nor the parent.
<https://tools.ietf.org/html/rfc7230#section-6.3.1 >
"Connections can be closed at any time, with or without intention."
As has been discussed in other threads recently there are servers out there starting to greylist TCP connections, closing the first one some time *after* SYN+ACK regardless of what the proxy sends and accepting any followup connection attempts.
NP: That can result in exactly the behaviour you describe from the peer as Squid does not wait for a FIN to arrive before sending its upstream HTTP request - Squid will "randomly" get a FIN or a RST depending on whether the FIN or the DATA packet wins the race into the Squid machines TCP stack. FIN and RST have different retry properties which might explain your "sometimes retries" behaviour.
Also, TCP connections fail quite often for many other reasons anyway. Anything from power fluctuations at a router to BGP switching the packet route dropping packets. They are most often a short-term situation which is resolved by the time the repeat is attempted.
What you are trying to do will result in Squid being unable to cope with any of these transitory restrictions from the TCP environment and force the client to receive a terminal error page.
That will greatly slow down detection and recovery from the slightly longer-lived TCP issues in Squid itself and may result in N other clients also unnecessarily receiving the same error response as bad connection attempts gets spread between many clients (all getting errors) instead of isolated to the one/few who hit it when the issue initially occurs.
Expanding the retries to large numbers (ie the recent default change to 25), or to low numbers (eg the old default of 5) are reasonable things to do depending on the network stability to your upstreams. But going all the way to 0 retries is guaranteed to lead to more client visible problems than necessary.
All that asside I phrased it as a question because you might have had a good reason for increasing the visible failure rates.
We're already fixing parent behavior, but still want to disable retries on squid side.
Since you describe this as peer misbehaviour, then treating it to Squids normal TCP failure recovery is the best behaviour. Retry is the intended correct behaviour for a proxy to perform on any non-idempotent requests. In your case up to a value of 1 retry before declaring non-temporary route failure.
NP: idempotent vs non-idempotent may be another reason behind the observed behaviour of retry happening only sometimes.
If you are doing this due to overall latency/delay on the affected client traffic you would be better off reducing the timeouts involved (cache_peer connect-timeout= parameter AFAICS) than aiming at a retry count of 0. Perhapse also requiring a few standby=N persistent connections to be maintained if the peer is HTTP/1.1 capable.
Amos
With best regards, Ivan Larionov.
_______________________________________________ squid-users mailing list squid-users@xxxxxxxxxxxxxxxxxxxxx http://lists.squid-cache.org/listinfo/squid-users