Pablo Neira Ayuso <pablo@xxxxxxxxxxxxx> wrote: > On Thu, Sep 28, 2023 at 03:12:44PM +0200, Florian Westphal wrote: > > nft_rbtree_gc_elem() walks back and removes the end interval element that > > comes before the expired element. > > > > There is a small chance that we've cached this element as 'rbe_ge'. > > If this happens, we hold and test a pointer that has been queued for > > freeing. > > > > It also causes spurious insertion failures: > > > > $ cat nft-test.20230921-143934.826.dMBwHt/test-testcases-sets-0044interval_overlap_0.1/testout.log > > Error: Could not process rule: File exists > > add element t s { 0 - 2 } > > ^^^^^^ > > Failed to insert 0 - 2 given: > > table ip t { > > set s { > > type inet_service > > flags interval,timeout > > timeout 2s > > gc-interval 2s > > } > > } > > > > The set (rbtree) is empty. The 'failure' doesn't happen on next attempt. > > > > Reason is that when we try to insert, the tree may hold an expired > > element that collides with the range we're adding. > > While we do evict/erase this element, we can trip over this check: > > > > if (rbe_ge && nft_rbtree_interval_end(rbe_ge) && nft_rbtree_interval_end(new)) > > return -ENOTEMPTY; > > > > rbe_ge was erased by the synchronous gc, we should not have done this > > check. Next attempt won't find it, so retry results in successful > > insertion. > > > > Restart in-kernel to avoid such spurious errors. > > > > Such restart are rare, unless userspace intentionally adds very large > > numbers of elements with very short timeouts while setting a huge > > gc interval. > > > > Even in this case, this cannot loop forever, on each retry an existing > > element has been removed. > > > > As the caller is holding the transaction mutex, its impossible > > for a second entity to add more expiring elements to the tree. > > > > After this it also becomes feasible to remove the async gc worker > > and perform all garbage collection from the commit path. > > > > Signed-off-by: Florian Westphal <fw@xxxxxxxxx> > > --- > > Changes since v1: > > - restart entire insertion process in case we remove > > element that we held as lesser/greater overlap detection > Thanks, I am still looking at finding a way to move this to .commit, > if no better solution, then let's get this in for the next round. I don't think its that bad. In most cases, no restart is required as no expired elements in the interesting range will ever be found. I think its fine really, no need to go for two trees or anything like pipapo is doing. I have a few patches that build on top of this, first one gets rid of async worker and places it in commit.