On 05/09/17 21:31, Vieri wrote:
Hi, I'm sometimes getting hit by ERR_GATEWAY_FAILURE. I'd like to know what could be causing this issue. When this happens on a production server, I don't have much time to investigate. I usually only have enough time to ssh into the squid server, test internet access via command line, and before I know it, the issue's gone.
Squid generates GATEWAY_FAILURE when URL-redirector/rewriter is not responding or TLS handshakes fail.
If it is the crypo issues that is exactly the kind of things which your SSH connection will not be able to get through either so of course is already gone when that TCP + encryption succeeds.
Nothing much in cache.log. I have debug_options rotate=1 ALL,1. I'd rather not set ALL,9 on a production system for something that happens maybe only once every 2 or 3 days. I'm not sure however which sections and levels to set so I can get an idea as to why I'm getting ERR_GATEWAY_FAILURE. https://wiki.squid-cache.org/KnowledgeBase/DebugSections Any suggestions?
In absence of ALL,9 (or ALL,6) you will have to work your way through the list of components involved with upstream server connections and then any components you are using that can slow Squid down in general or periodically.
DNS, Comm, and TLS levels - and also things like Digest creation, store rebuild, and cache replacement policy actions. Unfortunately most of those are major components used all the time, so not much better than ALL,6 in terms of log output.
Main focus obviously is on the domain/server(s) whose URL hit the issue, but anything else could be impacting the transaction latency so it is by no means certain to be that server.
I would start with DNS to see if the results are coming back fast enough. With the latest Squid you will also have to check all the permutations of DNS response ordering and timing since the "Happy Eyeballs" algorithms can mens Squid is only working with partial DNS results and failing when the incomplete IP set are all broken servers.
Then check for ICMPv4/v6 issues on the route(s) between Squid and all the servers IPs. A lot of networks still have disabled ICMP fully or partially in ways which can break route recovery. Lack of ICMP is how temporary router power spikes etc halfway across the Internet can kill traffic on your network for brief times. On the one hand these ICMP issues are not temporary (though Squid may only hit them if trying certain IPs), on the other it is not something that can be tested or logged from inside Squid. You will need to setup some sort of monitor to watch servers Squid connects to - maybe a trigger to automatically check anything that results in the gateway error being logged before you can manually login.
Next on the line would be TLS handshake behaviours with all IPs of the problem server(s). That is easier to test after the fact, but don't take success as a guarantee. It could still be a temporary failure in the handshake.
From there it is pot-luck and hope there are some clues lying about to hint at good directions.
Amos _______________________________________________ squid-users mailing list squid-users@xxxxxxxxxxxxxxxxxxxxx http://lists.squid-cache.org/listinfo/squid-users