Re: [PATCH nf 1/2] netfilter: nft_set_rbtree: move sync GC from insert path to set->ops->commit

Pablo Neira Ayuso <pablo@xxxxxxxxxxxxx> · Sun, 1 Oct 2023 22:10:03 +0200

Hi Florian,

On Sat, Sep 30, 2023 at 10:10:38AM +0200, Florian Westphal wrote:
> Pablo Neira Ayuso <pablo@xxxxxxxxxxxxx> wrote:
> > - read spin lock is required for the sync GC to make sure this does
> >   not zap entries that are being used from the datapath.
> 
> It needs to grab the write spinlock for each rb_erase, plus
> the seqcount increase to make sure that parallel lookup doesn't
> miss an element in the tree.

Right, read lock is not enough for sync GC, and it would be also
required by this approach.

> > - the full GC batch could be used to amortize the memory allocation
> >   (not only two slots as it happens now, I am recycling an existing
> >    function).
> 
> Yes.
> 
> > - ENOMEM on GC sync commit path could be an issue. It is too late to
> >   fail. The tree would start collecting expired elements that might
> >   duplicate existing, triggering bogus mismatches. In this path the
> >   commit_mutex is held, and this set backend does not support for
> >   lockless read,
> 
> It does.  If lockless doesn't return a match it falls back to readlock.

And it will in case of the update to remove the expired element, right?

> >   it might be possible to simply grab the spinlock
> >   in write mode and release entries inmediately, without requiring the
> >   sync GC batch infrastructure that pipapo is using.
> 
> Is there evidence that the on-demand GC is a problem?

After your last fix, not really, other than we have to be care not to
zap elements that are in any of the pending transaction in this batch.

> It only searches in the relevant subtree, it should rarely, if ever,
> encounter any expired element.

Not a problem now, but this path that we follow blocks a requested
feature.

I currently do not know how to support for set element timeout update
with this on-demand GC on the rbtree. This feature will require a new
update state in the transaction to update the timer for an element. We
will have to be careful because on-demand GC might zap an expired
element that got just refreshed in this transaction (unless we
reintroduce some sort of 'busy' bit for updates again which is
something I prefer we do not). With 2ee52ae94baa ("netfilter:
nft_set_rbtree: skip sync GC for new elements in this transaction")
I could fix by looking at the generation mask to infer that this
element is 'busy' by some pending transaction, but I do not see a way
with an element update command in place.

Maybe take your patch and then follow up to nf-next with this approach
based once set element timer update is introduced? Otherwise, rbtree
will block the introduction of this new feature for other sets too.