On 10/15/20 4:07 AM, Stephen Borrill wrote: > At a few installations of squid 4.12 (patched for GREASE) on NetBSD > 9, I'm seeing that occasionally one of the listening ports no longer > accepts connections (it doesn't reject them, but a connection does > not get established).> The port appears random; it's not the same > every time and isn't related to ports with SSL interception. A > restart of squid fixes it. > Looking through the logs, this appears to coincide with lines such as: > > 2020/10/14 22:32:16 kid1| ERROR: getsockname() failed to locate local-IP > on local=[::] remote=10.0.106.147:61996 FD 25 flags=1: (22) Invalid argument > 2020/10/14 22:32:16 kid1| BUG 3556: FD 25 is not an open socket. > > This looks similar to Alex Rousskov's recent observations: > https://bugs.squid-cache.org/show_bug.cgi?id=3556#c15 Please keep in mind that those "BUG 3556" messages warn us about Squid bugs elsewhere/somewhere in Squid code. For each particular message instance, the exact bug is unknown a priori, and several different bugs have triggered these messages in the past. While the original bug 3556 report was for a specific bug, the log messages were not (and are not). > However, we have also seen with at sites where there is no SSL > interception (the above lines are from such an installation). > 1) Am I right that triggering BUG 3556 could lead to the described symptoms? I would rephrase this as "Failure to obtain the (intended) IP address of an (intercepted) connection leads to BUG 3556 messages." > 2) Is this a squid bug or triggered by a problem in the underlying OS? Those "BUG 3556" messages indicate a Squid bug. There is no question about that. However, the ERROR messages may indicate a Squid bug/deficiency and/or an environment (OS configuration, etc.) problem. In summary, you are dealing with multiple problems here. You should focus on the ERROR messages, not "BUG 3556" messages. > If the latter, where to start looking? Check system log for errors. Perhaps you are exhausting some system resource? I would also try to map ERROR messages to client transactions in hope to spot some common pattern behind those failed transactions. Unfortunately, I do not know whether Squid (especially Squid v4) would log these failed transactions. > 3) What workarounds are there? a) Monitor logs and automatically restart the Squid instance if needed. b) Patch Squid to kill the affected process. Adding "assert(false);" after the ERROR message is printed in Comm::TcpAcceptor::oldAccept() will kill the process. Killing a single worker may or may not be enough in SMP mode; it would be interesting and potentially useful to know whether that is enough. You may be able to easily test your workaround using the trick I outlined in https://bugs.squid-cache.org/show_bug.cgi?id=3556#c15 > Given that a restart fixes it, in some > respects it would be better for squid to quit so it can be restarted > rather than continue to run in a half-working state. Yes, earlier Squids were written using the "Do whatever you can to stay up" or "Damn the torpedoes!" principle. FWIW, I am pushing for reversing the relevant code logic to follow the "Squid instance that cannot provide an essential service explicitly requested by the admin should quit with an error" principle, but it will take time to achieve that ideal. > 4) Related to 3), are there any other ways to detect the problem when it > is happening besides parsing logs or testing if all ports are accepting > connections? This could be used to trigger an automated restart as a > temporary workaround. Yes, see suggestion 3b above. HTH, Alex. _______________________________________________ squid-users mailing list squid-users@xxxxxxxxxxxxxxxxxxxxx http://lists.squid-cache.org/listinfo/squid-users