Hi Florian, On Sat, Sep 30, 2023 at 10:10:38AM +0200, Florian Westphal wrote: > Pablo Neira Ayuso <pablo@xxxxxxxxxxxxx> wrote: > > - read spin lock is required for the sync GC to make sure this does > > not zap entries that are being used from the datapath. > > It needs to grab the write spinlock for each rb_erase, plus > the seqcount increase to make sure that parallel lookup doesn't > miss an element in the tree. Right, read lock is not enough for sync GC, and it would be also required by this approach. > > - the full GC batch could be used to amortize the memory allocation > > (not only two slots as it happens now, I am recycling an existing > > function). > > Yes. > > > - ENOMEM on GC sync commit path could be an issue. It is too late to > > fail. The tree would start collecting expired elements that might > > duplicate existing, triggering bogus mismatches. In this path the > > commit_mutex is held, and this set backend does not support for > > lockless read, > > It does. If lockless doesn't return a match it falls back to readlock. And it will in case of the update to remove the expired element, right? > > it might be possible to simply grab the spinlock > > in write mode and release entries inmediately, without requiring the > > sync GC batch infrastructure that pipapo is using. > > Is there evidence that the on-demand GC is a problem? After your last fix, not really, other than we have to be care not to zap elements that are in any of the pending transaction in this batch. > It only searches in the relevant subtree, it should rarely, if ever, > encounter any expired element. Not a problem now, but this path that we follow blocks a requested feature. I currently do not know how to support for set element timeout update with this on-demand GC on the rbtree. This feature will require a new update state in the transaction to update the timer for an element. We will have to be careful because on-demand GC might zap an expired element that got just refreshed in this transaction (unless we reintroduce some sort of 'busy' bit for updates again which is something I prefer we do not). With 2ee52ae94baa ("netfilter: nft_set_rbtree: skip sync GC for new elements in this transaction") I could fix by looking at the generation mask to infer that this element is 'busy' by some pending transaction, but I do not see a way with an element update command in place. Maybe take your patch and then follow up to nf-next with this approach based once set element timer update is introduced? Otherwise, rbtree will block the introduction of this new feature for other sets too.