On Wed, Jan 18, 2023 at 02:09:44PM +0100, Stefano Brivio wrote: > On Wed, 18 Jan 2023 12:14:14 +0100 > Pablo Neira Ayuso <pablo@xxxxxxxxxxxxx> wrote: > > > ...instead of a tree descent, which became overly complicated in an > > attempt to cover cases where expired or inactive elements would affect > > comparisons with the new element being inserted. > > > > Further, it turned out that it's probably impossible to cover all those > > cases, as inactive nodes might entirely hide subtrees consisting of a > > complete interval plus a node that makes the current insertion not > > overlap. > > > > To speed up the overlap check, descent the tree to find a greater > > element that is closer to the key value to insert. Then walk down the > > node list for overlap detection. Starting the overlap check from > > rb_first() unconditionally is slow, it takes 10 times longer due to the > > full linear traversal of the list. > > > > Moreover, perform garbage collection of expired elements when walking > > down the node list to avoid bogus overlap reports. > > ...I'm quite convinced it's fine to perform the garbage collection > *after* we found the first element by descending the tree -- in the > worst case we include "too many" elements in the tree walk, but never > too little. Thanks for reviewing. I'm testing an extra chunk on top of this patch (see attachment) to reduce the number of visited nodes when walking the list. It re-adds nft_rbtree_update_first() with the "right logic" this time, basically: +static bool nft_rbtree_update_first(const struct nft_set *set, + struct nft_rbtree_elem *rbe, + struct rb_node *first) +{ + struct nft_rbtree_elem *first_elem; + + first_elem = rb_entry(first, struct nft_rbtree_elem, node); + /* this element is closest to where the new element is to be inserted: + * update the first element for the node list path + */ + if (nft_rbtree_cmp(set, rbe, first_elem) < 0) + return true; + + return false; +} static int __nft_rbtree_insert(const struct net *net, const struct nft_set *set, struct nft_rbtree_elem *new, struct nft_set_ext **ext) @@ -272,7 +288,10 @@ static int __nft_rbtree_insert(const struct net *net, const struct nft_set *set, if (d < 0) { p = &parent->rb_left; } else if (d > 0) { - first = &rbe->node; + if (!first || + nft_rbtree_update_first(set, rbe, first)) + first = &rbe->node; + p = &parent->rb_right; } else { if (nft_rbtree_interval_end(rbe)) When updating the first node in the list walk, only update first if: node < first this means 'node' is closest where we want to add the element, to reduce the number of visited nodes in list walk. In summary: 1) Descend the tree to check if node > new. 2) If no first: first = node else if node < first: first = node because the tree is reverse order, node lesser than 'first' means node is closer to where we want to start the list walk. I observe less visited nodes in the list walk, still time to reload a large set is similar. I suspect it is the comparison in nft_rbtree_update_first() is the reason (with this chunk, this performs less comparisons in the list walk but now we have to compare nodes when descending the tree to find the first node). [...] > > - * > > - * However, if the existing start matches a pre-existing start, but the > > - * end element doesn't match the corresponding pre-existing end element, > > - * we need to report a partial overlap. This is a local condition that > > - * can be noticed without need for a tracking flag, by checking for a > > - * local duplicated end for a corresponding start, from left and right, > > - * separately. > > + /* Descent the tree to search for an existing element greater than the > > s/Descent/Descend/ I'll fix this nit, thanks.
diff --git a/net/netfilter/nft_set_rbtree.c b/net/netfilter/nft_set_rbtree.c index 852e8302ad58..297fedc6dec3 100644 --- a/net/netfilter/nft_set_rbtree.c +++ b/net/netfilter/nft_set_rbtree.c @@ -247,6 +247,22 @@ static int nft_rbtree_gc_elem(const struct nft_set *__set, return 0; } +static bool nft_rbtree_update_first(const struct nft_set *set, + struct nft_rbtree_elem *rbe, + struct rb_node *first) +{ + struct nft_rbtree_elem *first_elem; + + first_elem = rb_entry(first, struct nft_rbtree_elem, node); + /* this element is closest to where the new element is to be inserted: + * update the first element for the node list path + */ + if (nft_rbtree_cmp(set, rbe, first_elem) < 0) + return true; + + return false; +} + static int __nft_rbtree_insert(const struct net *net, const struct nft_set *set, struct nft_rbtree_elem *new, struct nft_set_ext **ext) @@ -272,7 +288,10 @@ static int __nft_rbtree_insert(const struct net *net, const struct nft_set *set, if (d < 0) { p = &parent->rb_left; } else if (d > 0) { - first = &rbe->node; + if (!first || + nft_rbtree_update_first(set, rbe, first)) + first = &rbe->node; + p = &parent->rb_right; } else { if (nft_rbtree_interval_end(rbe))