Hi Florian On Mon, Dec 09, 2024 at 11:04:01PM +0100, Florian Westphal wrote: > Pablo Neira Ayuso <pablo@xxxxxxxxxxxxx> wrote: > > > One way would be to allocate nft_trans_rule objects + one nft_trans_chain > > > object, deactivate the rules + the chain and then defer the freeing to the > > > nft destroy workqueue. We'd still need to keep the synchronize_rcu path as > > > a fallback to handle -ENOMEM corner cases though. > > > > I think it can be done _without_ nft_trans objects. > > > > Since the commit mutex is held in this netdev event path: Remove this > > basechain, deactivate rules and add basechain to global list protected > > with spinlock, it invokes worker. Then, worker zaps this list > > basechains, it calls synchronize_rcu() and it destroys rules and then > > basechain. No memory allocations needed in this case? > > I don't think its possible due to netlink dumps. > > Its safe for normal commit path because list_del_rcu() gets called on > these objects (making them unreachable for new dumps), then, > synchronize_rcu is called (so all concurrent dumpers are done) and then > we free. > > It would work if you add a new list_head to the basechain, then you > could just link the basechain object to some other list and then > call nf_tables_rule_release() AFTER a synchronize_rcu because rules > can't be picked up if the owning chain is no longer reachable. > > However, I do wonder how complex it would be. I started a patch, spent several hours in a row, but I need more time to get this right. There is also Phil making the line to remove this in nf-next. > This is tricky to get right and I'm not sure this patch adds a > noticeable issue, see nft_rcv_nl_event() which has similar pattern, > i.e. synchronize_rcu() is called. I can see veth hits synchronize_rcu(), this can be called from default_device_exit_batch() which calls unregister_netdevice_many(). I need time, but it is difficult. We have to deliver a pull request very late on wednesday so it can be taken on thursday. Now the week has three days to fix and test stuff, so upstream maintainers have a day to tidy up and submit upstream. > I'm not saying we should not consider async route, but maybe better > to followup in -next? Yes, I can follow up in -next, but then Phil just remove this again in -next. I can just take this fix to address the existing situation which is worse than before. Before this patch UAF was possible, although I never manage to trigger it. Now crash is easy to trigger, I spent too much time looking at the interaction with layers and I forgot basic assumption regarding nf_tables in this patch.