Re: [PATCH nf] netfilter: nf_tables: do not defer rule destruction via call_rcu

Florian Westphal <fw@xxxxxxxxx> · Mon, 9 Dec 2024 23:04:01 +0100

Pablo Neira Ayuso <pablo@xxxxxxxxxxxxx> wrote:
> > One way would be to allocate nft_trans_rule objects + one nft_trans_chain
> > object, deactivate the rules + the chain and then defer the freeing to the
> > nft destroy workqueue.  We'd still need to keep the synchronize_rcu path as
> > a fallback to handle -ENOMEM corner cases though.
> 
> I think it can be done _without_ nft_trans objects.
> 
> Since the commit mutex is held in this netdev event path: Remove this
> basechain, deactivate rules and add basechain to global list protected
> with spinlock, it invokes worker. Then, worker zaps this list
> basechains, it calls synchronize_rcu() and it destroys rules and then
> basechain. No memory allocations needed in this case?

I don't think its possible due to netlink dumps.

Its safe for normal commit path because list_del_rcu() gets called on
these objects (making them unreachable for new dumps), then,
synchronize_rcu is called (so all concurrent dumpers are done) and then
we free.

It would work if you add a new list_head to the basechain, then you
could just link the basechain object to some other list and then
call nf_tables_rule_release() AFTER a synchronize_rcu because rules
can't be picked up if the owning chain is no longer reachable.

However, I do wonder how complex it would be.

This is tricky to get right and I'm not sure this patch adds a
noticeable issue, see nft_rcv_nl_event() which has similar pattern,
i.e. synchronize_rcu() is called.

I'm not saying we should not consider async route, but maybe better
to followup in -next?