On 8/30/19 8:16 AM, Ilari Laitinen wrote: > I also noticed that the platform (in general) tries to resolve ipv6 > first, but the TCP dumps have no ipv6 packages at all. This is > baffling, because there were indeed some unrelated open ipv6 > connections on the Squid server (reported by netstat). You may be able to validate your packet collection rules by adjusting them to include those known IPv6 connections/ports. Perhaps you are just not collecting IPv6 traffic. It is also possible that Squid gets AAAA records but never uses them because Squid thinks that IPv6 is disabled on your server. > I unfortunately cannot share the debug log because it contains some > sensitive information. We nevertheless recorded what ended up being a > huge sample. If you hire a Squid developer to help you, they should be willing to sign a reasonable NDA and/or view data on your servers, without copying. IMHO, it does not make much sense to sit on a likely valuable direct information while, at the same time, spending a lot of time to find distant echoes of that same information elsewhere! > I suspect Squid might be waiting for local TCP ports from the kernel > (or something related). IIRC, ephemeral source port allocator is instantaneous -- Squid either gets a port or a port allocation error, without waiting. When we overload the server with high-performance tests (without an explicit port manager), we see port allocation errors rather than stalled tests. However, perhaps that is not true in your OS/environment. > Right now, there are four different IP addresses returned for the > target cloud service. For practical purposes, they are returned in a > random order. The traffic would ideally be spread over all of them. > Unfortunately it is evident both from the debug log and from the TCP > dump that Squid is using only one of the addresses at a time. The > amount of connections in the TIME_WAIT state for that single IP > address gets very close to the maximum defined by the > net.ipv4.ip_local_port_range sysctl. After a while (a minute or so in > the recording) this address changes presumably in response to a new > DNS query result. In theory, Squid should round-robin across all destination IP addresses for a single host name. If your Squid v3 does not, it is probably a Squid bug that can be fixed [by upgrading]. Said that, IIRC, the notion of "round robin" is rather vague in Squid because there are several places where an IP may be requested for the same host name inside the same transaction. I would not be surprised if that low-level round-robin behavior results in the same IP being used for most transactions in some environments (until an error or a new DNS query reshuffles the IPs). Debugging logs may expose this problem. > Could this be the bottleneck? I would expect that the lack of ports would lead to errors, not stalled transactions. However, there may be some hidden dependency that I am missing. For example, lack of ports leads to errors, the errors are not logged where you can see them, but lead to excessive DNS retries and/or Squid bugs that lead to delays. > One possible workaround that I can think of is setting a short > positive_dns_ttl, but this doesn’t fully guarantee an even > distribution, now does it? No, it does not. Moreover, Squid v3 had some TTL handling bugs that were fixed (in v4 and later code) by the Happy Eyeballs project. Taking all the known problems into the account, it is difficult for me to predict the effect of changing TTLs. Said that, it does not hurt to try! Maybe you will be lucky, and a simple configuration change will remove the cause of increasing transaction delays. HTH, Alex. _______________________________________________ squid-users mailing list squid-users@xxxxxxxxxxxxxxxxxxxxx http://lists.squid-cache.org/listinfo/squid-users