On 7/28/23 10:51, magri@xxxxxx wrote:
I did a complete rewrite of my patch to remove all recursive
calls in SplayNode. See GitHub Pull Request #1431.
Thank you! Already reviewed :-).
For our large number of IP addresses and clients (request/s) a different
storage type should indeed be better suited to avoid the overhead of
permanent tree rebalancing.
There seems to be strong consensus regarding replacing Squid splay
trees. I hope somebody will do it in the foreseeable future, but I do
not know of anybody working on that replacement right now.
FWIW, my _primary_ replacement motivation is not persistent rebalancing
costs but terrible worst-case performance combined with a real
possibility of triggering that worst-case performance, completely
outside admin knowledge or control. Your use case with an idle Squid is
a good illustration of that problem, but things can get much worse.
Cheers,
Alex.
Am 26.07.23 um 18:22 schrieb Alex Rousskov:
On 7/26/23 11:25, magri@xxxxxx wrote:
I've attached a patch that modifies the destroy method of SplayNode
to avoid recursive calls of destroy in single child nodes by moving
the subtrees of the child to the parent node before destroying the
child non recursively.
Hi Martin,
Thanks a lot for sharing your well-documented fix! Would you be
willing to post it as a GitHub Pull Request? If you do, then the
following questions can be discussed on GitHub instead:
* The patch reduces recursion during tree destruction. Squid splay tree
implementation contains other recursive operations (e.g., visit()).
Would not those other operations suffer from the same problem if left
untouched? (And, if they are immune, then why not mimic their wonderful
code to destroy the tree as well?!)
* The proposed patch still destroys one child recursively (when both
children are present). That feels like an incomplete solution that will
still hit stack limits on some trees. I do understand that you are
addressing a specific case of an idle Squid with a freshly configured
degenerate tree, but there ought to be other cases that lead to similar
results as well (accidentally or with a malicious actor help). Have you
considered removing recursion completely instead (by using more heap
memory as needed)?
* I am curious whether your specific use case (going beyond splay tree
destruction) be better addressed by a different storage type than splay
trees. For example, have you considered whether using a IP
address-friendly hash would be faster for, say, one million IP addresses?
Cheers,
Alex.
On 7/26/23 11:25, magri@xxxxxx wrote:
Hi squid community,
we experience segfaults with squid 6.1 (and also older versions) during
"squid -k reconfigure" on several linux systems, caused by a stack
overflow.
The circumstances are rather special:
- we have a huge dst ip blacklist (> 300.000 enties)
- this original list is sorted (by raw ip value)
- the proxy is a hot standby system so there are no client requests
between start of squid and reconfigure
We could trace the cause of the segfault to the destroy-method of
SplayNode (in include/splay.h):
- destroy of the top node is run on every reconfigure
- destroy is recursive, so the function calls stack on the stack :-)
up to the depth of the splay tree
- the insertion of an ordered ip list into the splay tree creates a
degenerate splay tree (each node has only a left child) with a depth
of the number of ip entries in the list
- due to the inactivity of the proxy there is no rebalancing of the
tree through find calls
On amd64 every stack frame needs 32 byte. So the regular linux stack of
8 MB can hold about ~262.000 stack frames. On other architectures
the limit is reached even faster (s390x: 160 bytes per stack frame ->
52.000 frames).
We tried to increase the stack size via ulimits and systemd, but though
the squid master process got the increased limits, the kid-processes are
always spawned with a soft limit of 8 MB and crash on reaching it.
Only forcing a new soft limit via prlimit on an already running kid
could avoid the crash.
I've attached a patch that modifies the destroy method of SplayNode to
avoid recursive calls of destroy in single child nodes by moving the
subtrees of the child to the parent node before destroying the child non
recursively.
With this patch we could no longer reproduce the segfaults, even with a
list of 10.000.000 ordered ips.
Any ideas, comments, better solutions?
Thanks in advance for any advice.
best regards
Martin
_______________________________________________
squid-users mailing list
squid-users@xxxxxxxxxxxxxxxxxxxxx
http://lists.squid-cache.org/listinfo/squid-users
_______________________________________________
squid-users mailing list
squid-users@xxxxxxxxxxxxxxxxxxxxx
http://lists.squid-cache.org/listinfo/squid-users
_______________________________________________
squid-users mailing list
squid-users@xxxxxxxxxxxxxxxxxxxxx
http://lists.squid-cache.org/listinfo/squid-users
_______________________________________________
squid-users mailing list
squid-users@xxxxxxxxxxxxxxxxxxxxx
http://lists.squid-cache.org/listinfo/squid-users