Hi Stefano, On Thu, May 12, 2022 at 08:34:21PM +0200, Stefano Brivio wrote: > In the overlap detection performed as part of the insertion operation, > we have to skip nodes that are not active in the current generation or > expired. This was done by adding several conditions in overlap decision > clauses, which made it hard to check for correctness, and debug the > actual issue this patch is fixing. > > Consolidate these conditions into a single clause, so that we can > proceed with debugging and fixing the following issue. > > Case b3. described in comments covers the insertion of a start > element after an existing end, as a leaf. If all the descendants of > a given element are inactive, functionally, for the purposes of > overlap detection, we still have to treat this as a leaf, but we don't. > > If, in the same transaction, we remove a start element to the right, > remove an end element to the left, and add a start element to the right > of an existing, active end element, we don't hit case b3. For example: > > - existing intervals: 1-2, 3-5, 6-7 > - transaction: delete 3-5, insert 4-5 > > node traversal might happen as follows: > - 2 (active end) > - 5 (inactive end) > - 3 (inactive start) > > when we add 4 as start element, we should simply consider 2 as > preceding end, and consider it as a leaf, because interval 3-5 has been > deleted. If we don't, we'll report an overlap because we forgot about > this. > > Add an additional flag which is set as we find an active end, and reset > it if we find an active start (to the left). If we finish the traversal > with this flag set, it means we need to functionally consider the > previous active end as a leaf, and allow insertion instead of reporting > overlap. I can still trigger EEXIST with deletion of existing interval. It became harder to reproduce after this patch. After hitting EEXIST, if I do: echo "flush ruleset" > test.nft nft list ruleset >> test.nft to dump the existing ruleset, then I run the delete element command again to remove the interval and it works. Before this patch I could reproduce it by reloading an existing ruleset dump. I'm running the script that I'm attaching manually, just one manual invocation after another. I ocassionally hit ENOBUFS, but that sounds like a different issue that I'm looking into. Thanks. > Reported-by: Pablo Neira Ayuso <pablo@xxxxxxxxxxxxx> > Fixes: 7c84d41416d8 ("netfilter: nft_set_rbtree: Detect partial overlaps on insertion") > Signed-off-by: Stefano Brivio <sbrivio@xxxxxxxxxx> > --- > net/netfilter/nft_set_rbtree.c | 92 ++++++++++++++++++++-------------- > 1 file changed, 54 insertions(+), 38 deletions(-) > > diff --git a/net/netfilter/nft_set_rbtree.c b/net/netfilter/nft_set_rbtree.c > index 7325bee7d144..dc2184bbe722 100644 > --- a/net/netfilter/nft_set_rbtree.c > +++ b/net/netfilter/nft_set_rbtree.c > @@ -222,6 +222,7 @@ static int __nft_rbtree_insert(const struct net *net, const struct nft_set *set, > bool overlap = false, dup_end_left = false, dup_end_right = false; > struct nft_rbtree *priv = nft_set_priv(set); > u8 genmask = nft_genmask_next(net); > + bool last_left_node_is_end = false; > struct nft_rbtree_elem *rbe; > struct rb_node *parent, **p; > int d; > @@ -287,80 +288,95 @@ static int __nft_rbtree_insert(const struct net *net, const struct nft_set *set, > d = memcmp(nft_set_ext_key(&rbe->ext), > nft_set_ext_key(&new->ext), > set->klen); > - if (d < 0) { > + > + if (d < 0) > p = &parent->rb_left; > + else if (d > 0) > + p = &parent->rb_right; > + else if (nft_rbtree_interval_end(rbe)) > + p = &parent->rb_left; > + else > + p = &parent->rb_right; > > + /* There might be inactive elements in the tree: ignore them by > + * traversing them without affecting flags. > + * > + * We need to reset the dup_end_left and dup_end_right flags, > + * though, because those only apply to adjacent nodes. > + */ > + if (!nft_set_elem_active(&rbe->ext, genmask) || > + nft_set_elem_expired(&rbe->ext)) { > + dup_end_left = dup_end_right = false; > + continue; > + } > + > + if (d < 0) { > if (nft_rbtree_interval_start(new)) { > - if (nft_rbtree_interval_end(rbe) && > - nft_set_elem_active(&rbe->ext, genmask) && > - !nft_set_elem_expired(&rbe->ext) && !*p) > - overlap = false; > + /* See case b3. described above. > + * > + * If this is not a leaf, but all nodes below > + * this one are inactive, except for a leaf, we > + * still have to consider it a leaf for the > + * purposes of overlap detection. > + * > + * Set last_left_node_is_end if this is not a > + * leaf and an active end element, and reset it > + * if we find an active start element to the > + * left. > + * > + * If we end the traversal with this flag set, > + * this node is a leaf for the purposes of case > + * b3., and no overlap will be reported. > + */ > + if (nft_rbtree_interval_end(rbe)) { > + if (!*p) > + overlap = false; > + else > + last_left_node_is_end = true; > + } else { > + last_left_node_is_end = false; > + } > } else { > + > if (dup_end_left && !*p) > return -ENOTEMPTY; > > - overlap = nft_rbtree_interval_end(rbe) && > - nft_set_elem_active(&rbe->ext, > - genmask) && > - !nft_set_elem_expired(&rbe->ext); > - > + overlap = nft_rbtree_interval_end(rbe); > if (overlap) { > dup_end_right = true; > continue; > } > } > } else if (d > 0) { > - p = &parent->rb_right; > - > if (nft_rbtree_interval_end(new)) { > if (dup_end_right && !*p) > return -ENOTEMPTY; > > - overlap = nft_rbtree_interval_end(rbe) && > - nft_set_elem_active(&rbe->ext, > - genmask) && > - !nft_set_elem_expired(&rbe->ext); > - > + overlap = nft_rbtree_interval_end(rbe); > if (overlap) { > dup_end_left = true; > continue; > } > - } else if (nft_set_elem_active(&rbe->ext, genmask) && > - !nft_set_elem_expired(&rbe->ext)) { > + } else { > overlap = nft_rbtree_interval_end(rbe); > } > } else { > if (nft_rbtree_interval_end(rbe) && > nft_rbtree_interval_start(new)) { > - p = &parent->rb_left; > - > - if (nft_set_elem_active(&rbe->ext, genmask) && > - !nft_set_elem_expired(&rbe->ext)) > - overlap = false; > + overlap = false; > } else if (nft_rbtree_interval_start(rbe) && > nft_rbtree_interval_end(new)) { > - p = &parent->rb_right; > - > - if (nft_set_elem_active(&rbe->ext, genmask) && > - !nft_set_elem_expired(&rbe->ext)) > - overlap = false; > - } else if (nft_set_elem_active(&rbe->ext, genmask) && > - !nft_set_elem_expired(&rbe->ext)) { > + overlap = false; > + } else { > *ext = &rbe->ext; > return -EEXIST; > - } else { > - overlap = false; > - if (nft_rbtree_interval_end(rbe)) > - p = &parent->rb_left; > - else > - p = &parent->rb_right; > } > } > > dup_end_left = dup_end_right = false; > } > > - if (overlap) > + if (overlap && !last_left_node_is_end) > return -ENOTEMPTY; > > rb_link_node_rcu(&new->node, parent, p); > -- > 2.35.1 >
Attachment:
rbtree-bug.sh
Description: Bourne shell script