On 2024-07-12 06:58, paolo.prinx@xxxxxxxxx wrote:
We are having some stability issues with our squid farms after a recent
upgrade from Centos/Squid 3.5.x to Ubuntu/Squid 5.7/6.9.
In short, after running for a certain period the servers run out of file
descriptors. We see a slowly growing number of TCP or TCPv6 socket
handles
Assuming that your Squids are not under an ever-increasing load, what
you describe sounds like a Squid bug. I do not see any obviously related
fixes in the latest official code, so it is possible that this bug is
unknown to Squid developers and is still present in v6+. I recommend the
following steps:
1. Forget about Squid v5. Aim to upgrade to Squid v6.
2. Collect a few mgr:filedescriptors cache manager snapshots from a
problematic Squid in hope to discover a common theme among leaked
descriptors metadata. Share your findings (and/or a pointer to
compressed snapshots).
3. Check cache.log for frequent (or at least persistent) ERROR and
WARNING messages and report your findings.
4. Does your Squid grow its resident memory usage as well? Descriptor
leaks are often (but not always!) accompanied by memory leaks. The
latter are sometimes easier to pinpoint. If (and only if) your Squid is
leaking a lot of memory, then collect a few dozen mgr:mem snapshots
(e.g., one every busy hour) and share a pointer to a compressed snapshot
archive for analysis by Squid developers. There is at least one v6
memory leak fixed in master/v7 (Bug 5322), but, hopefully, you are not
suffering from that memory leak (otherwise the noise from that leak may
obscure what we are looking for).
You may continue this triage on this mailing list or file a bug report
at https://bugs.squid-cache.org/enter_bug.cgi?product=Squid
Thank you,
Alex.
It is somewhat similar to what reported under
https://access.redhat.com/solutions/3362211
<https://access.redhat.com/solutions/3362211> . They state that
* /If an application fails to |close()| it's socket descriptors and
continues to allocate new sockets then it can use up all the system
memory on TCP(v6) slab objects./
* /Note some of these sockets will not show up in
|/proc/net/sockstat(6)|. Sockets that still have a file descriptor
but are in the |TCP_CLOSE| state will consume a slab object. But
will not be accounted for in |/proc/net/sockstat(6)| or "ss" or
"netstat"./
* It can be determined whether this is an application sockets leak, by
stopping the application processes that are consuming sockets. If
the slab objects in |/proc/slabinfo| are freed then the application
is responsible. As that means that destructor routines have found
open file descriptors to sockets in the process.
/
/
/"/This is most likely to be a case of the application not handling
error conditions correctly and not calling |close()| to free the FD and
socket."/
/
For example, on a server with squid 5.7, unmodified package:
list of open files;
lsof |wc -l
56963
of which 35K in TCPv6:
lsof |grep proxy |grep TCPv6 |wc -l
35301
under /proc I see less objects
cat /proc/net/tcp6 |wc -l
3095
but the number of objects in the slabs is high
cat /proc/slabinfo |grep TCPv6
MPTCPv6 0 0 2048 16 8 : tunables 0
0 0 : slabdata 0 0 0
tw_sock_TCPv6 1155 1155 248 33 2 : tunables 0
0 0 : slabdata 35 35 0
request_sock_TCPv6 0 0 304 26 2 : tunables 0
0 0 : slabdata 0 0 0
TCPv6 *38519 38519* 2432 13 8 : tunables 0 0 0 :
slabdata 2963 2963 0
I have 35K of lines like this
lsof |grep proxy |grep TCPv6 |more
squid 1049 proxy 13u sock
0,8 0t0 5428173 protocol: TCPv6
squid 1049 proxy 14u sock
0,8 0t0 27941608 protocol: TCPv6
squid 1049 proxy 24u sock
0,8 0t0 45124047 protocol: TCPv6
squid 1049 proxy 25u sock
0,8 0t0 50689821 protocol: TCPv6
...
We thought maybe this is a weird IPv6 thing, as we only route IPv4, so
we compiled a more recent version of squid with no v6 support. The thing
just moved to TCP4..
lsof |wc -l
120313
cat /proc/slabinfo |grep TCP
MPTCPv6 0 0 2048 16 8 : tunables 0 0
0 : slabdata 0 0 0
tw_sock_TCPv6 0 0 248 33 2 : tunables 0 0
0 : slabdata 0 0 0
request_sock_TCPv6 0 0 304 26 2 : tunables 0 0
0 : slabdata 0 0 0
TCPv6 208 208 2432 13 8 : tunables 0 0
0 : slabdata 16 16 0
MPTCP 0 0 1856 17 8 : tunables 0 0
0 : slabdata 0 0 0
tw_sock_TCP 5577 5577 248 33 2 : tunables 0 0
0 : slabdata 169 169 0
request_sock_TCP 1898 2002 304 26 2 : tunables 0 0
0 : slabdata 77 77 0
TCP *102452 113274 * 2240 14 8 : tunables 0 0 0 :
slabdata 8091 8091 0
cat /proc/net/tcp |wc -l
255
After restarting squid the slab objects are released and the open file
descriptors drop to a reasonable value. This further suggests it is
squid hanging on to these FDs.
lsof |grep proxy |wc -l
1221
Any suggestion? I guess it's something blatantly obvious, but it's a
couple of days we look at this and we're not going anywhere...
Thanks again
_______________________________________________
squid-users mailing list
squid-users@xxxxxxxxxxxxxxxxxxxxx
https://lists.squid-cache.org/listinfo/squid-users
_______________________________________________
squid-users mailing list
squid-users@xxxxxxxxxxxxxxxxxxxxx
https://lists.squid-cache.org/listinfo/squid-users