Re: [PATCH nf] nft_set_rbtree: Switch to node list walk for overlap detection

Stefano Brivio <sbrivio@xxxxxxxxxx> · Wed, 6 Jul 2022 23:12:42 +0200

On Tue, 5 Jul 2022 13:53:47 +0200
Pablo Neira Ayuso <pablo@xxxxxxxxxxxxx> wrote:

> Hi Stefano,
> 
> On Sat, Jul 02, 2022 at 01:55:10AM +0200, Stefano Brivio wrote:
> > On Mon, 27 Jun 2022 18:59:06 +0200
> > Pablo Neira Ayuso <pablo@xxxxxxxxxxxxx> wrote:
> >   
> > > Hi Stefano,
> > > 
> > > On Tue, Jun 14, 2022 at 03:07:04AM +0200, Stefano Brivio wrote:  
> > > > ...instead of a tree descent, which became overly complicated in an
> > > > attempt to cover cases where expired or inactive elements would
> > > > affect comparisons with the new element being inserted.
> > > >
> > > > Further, it turned out that it's probably impossible to cover all
> > > > those cases, as inactive nodes might entirely hide subtrees
> > > > consisting of a complete interval plus a node that makes the current
> > > > insertion not overlap.
> > > >
> > > > For the insertion operation itself, this essentially reverts back to
> > > > the implementation before commit 7c84d41416d8
> > > > ("netfilter: nft_set_rbtree: Detect partial overlaps on insertion"),
> > > > except that cases of complete overlap are already handled in the
> > > > overlap detection phase itself, which slightly simplifies the loop to
> > > > find the insertion point.
> > > >
> > > > Reported-by: Pablo Neira Ayuso <pablo@xxxxxxxxxxxxx>
> > > > Fixes: 7c84d41416d8 ("netfilter: nft_set_rbtree: Detect partial overlaps on insertion")
> > > > Signed-off-by: Stefano Brivio <sbrivio@xxxxxxxxxx>
> > > > ---
> > > >  net/netfilter/nft_set_rbtree.c | 194 ++++++++++-----------------------
> > > >  1 file changed, 58 insertions(+), 136 deletions(-)    
> > > 
> > > When running tests this is increasing the time to detect overlaps in
> > > my testbed, because of the linear list walk for each element.  
> > 
> > ...by the way, I observed it as well, and I was wondering: how bad is
> > too bad? My guess was that as long as we insert a few thousand elements
> > (with more, I expect hash or pipapo to be used) in a few seconds, it
> > should be good enough.  
> 
> From few seconds to less than 30 seconds in one testbed here.

I didn't understand: so it's really bad? Because sure, I see that
running the test scripts you shared now takes much longer, but I wonder
how far that is from actual use cases.

> > > So I have been looking at an alternative approach (see attached patch) to
> > > address your comments. The idea is to move out the overlapping nodes
> > > from the element in the tree, instead keep them in a list.
> > > 
> > >                         root
> > >                         /  \
> > >                      elem   elem -> update -> update
> > >                             /  \
> > >                          elem  elem
> > > 
> > > Each rbtree element in the tree .has pending_list which stores the
> > > element that supersede the existing (inactive) element. There is also a
> > > .list which is used to add the element to the .pending_list. Elements
> > > in the tree might have a .pending_list with one or more elements.  
> > 
> > I see a problem with this, that perhaps you already solved, but I don't
> > understand how.
> > 
> > The original issue here was that we have inactive elements in the tree
> > affecting the way we descend it to look for overlaps. Those inactive
> > elements are not necessarily overlapping with anything.
> > 
> > If they overlap, the issue is solved with your patch. But if they
> > don't...?
> >
> > Sure, we'll grant insertion of overlapping elements in case the overlap
> > is with an inactive one, but this solves the particular case of
> > matching elements, not overlapping intervals.
> > 
> > At a first reading, I thought you found some magic way to push out all
> > inactive elements to some parallel, linked structure, which we can
> > ignore as we look for overlapping _intervals_. But that doesn't seem to
> > be the case, right?  
> 
> With my patch, when descending the tree, the right or left branch is
> selected uniquely based on the key value (regardless the element
> state)

Hmm, but wait, that was exactly the problem I introduced (or at least
my understanding of it): if subtrees of entirely inactive nodes hide
(by affecting how we descend the tree) active nodes (that affect the
overlap decision), we can't actually reach any conclusion.

It's fine as long as it's inactive leaves, or single nodes, but not
more.

Let's say we need to insert element with key 6, as interval start,
given this tree, where:

- (s) and (e) mark starts and ends
- (i) and (a) mark inactive and active elements

                 4 (s, i)
                 /      \
                /        \
        2 (s, i)          7 (e, i)
        /      \          /      \
       /        \        /        \
  1 (s, a)  3 (e, a)  5 (s, i)  8 (s, i)

we visit elements with keys 4, 7, 5: we have an inactive start on the
left (5), an inactive end to the right (7), and we don't know if it
overlaps, because we couldn't see 1 and 3. Depending on how hard we
tried to fix bugs we hit in the past, we'll consider that an overlap
or not.

Without inactive elements, we would have this tree:

1 (s, a)
     \
      \
    3 (e, a)

where we visit elements with keys 1, 3: we find an active start on
the left, and then the corresponding end, also to the left, so we
conclude that start element with key 6 doesn't overlap.

> I removed the "turn left" when node is inactive case. There
> are also no more duplicated elements with the same value.

This simplifies the handling of those cases, we wouldn't need all those
clauses anymore, but I really think that the existing problem comes from
the fact we can *not* descend the tree just by selecting key values.

-- 
Stefano