On Thu, Nov 07, 2024 at 11:55:47AM +0100, Paolo Abeni wrote: > Hi, > On 11/7/24 00:58, Pablo Neira Ayuso wrote: > > 8c873e219970 ("netfilter: core: free hooks with call_rcu") removed > > synchronize_net() call when unregistering basechain hook, however, > > net_device removal event handler for the NFPROTO_NETDEV was not updated > > to wait for RCU grace period. > > > > Note that 835b803377f5 ("netfilter: nf_tables_netdev: unregister hooks > > on net_device removal") does not remove basechain rules on device > > removal, I was hinted to remove rules on net_device removal later, see > > 5ebe0b0eec9d ("netfilter: nf_tables: destroy basechain and rules on > > netdevice removal"). > > > > Although NETDEV_UNREGISTER event is guaranteed to be handled after > > synchronize_net() call, this path needs to wait for rcu grace period via > > rcu callback to release basechain hooks if netns is alive because an > > ongoing netlink dump could be in progress (sockets hold a reference on > > the netns). > > > > Note that nf_tables_pre_exit_net() unregisters and releases basechain > > hooks but it is possible to see NETDEV_UNREGISTER at a later stage in > > the netns exit path, eg. veth peer device in another netns: > > > > cleanup_net() > > default_device_exit_batch() > > unregister_netdevice_many_notify() > > notifier_call_chain() > > nf_tables_netdev_event() > > __nft_release_basechain() > > > > In this particular case, same rule of thumb applies: if netns is alive, > > then wait for rcu grace period because netlink dump in the other netns > > could be in progress. Otherwise, if the other netns is going away then > > no netlink dump can be in progress and basechain hooks can be released > > inmediately. > > > > While at it, turn WARN_ON() into WARN_ON_ONCE() for the basechain > > validation, which should not ever happen. > > > > Fixes: 835b803377f5 ("netfilter: nf_tables_netdev: unregister hooks on net_device removal") > > Signed-off-by: Pablo Neira Ayuso <pablo@xxxxxxxxxxxxx> > > --- > > include/net/netfilter/nf_tables.h | 2 ++ > > net/netfilter/nf_tables_api.c | 41 +++++++++++++++++++++++++------ > > 2 files changed, 36 insertions(+), 7 deletions(-) > > > > diff --git a/include/net/netfilter/nf_tables.h b/include/net/netfilter/nf_tables.h > > index 91ae20cb7648..8dd8e278843d 100644 > > --- a/include/net/netfilter/nf_tables.h > > +++ b/include/net/netfilter/nf_tables.h > > @@ -1120,6 +1120,7 @@ struct nft_chain { > > char *name; > > u16 udlen; > > u8 *udata; > > + struct rcu_head rcu_head; > > I'm sorry to be pedantic but the CI is complaining about the lack of > kdoc for this field... > > > > > /* Only used during control plane commit phase: */ > > struct nft_rule_blob *blob_next; > > @@ -1282,6 +1283,7 @@ struct nft_table { > > struct list_head sets; > > struct list_head objects; > > struct list_head flowtables; > > + possible_net_t net; > > ... and this one ... > > > u64 hgenerator; > > u64 handle; > > u32 use; > > [...] > > +static void nft_release_basechain_rcu(struct rcu_head *head) > > +{ > > + struct nft_chain *chain = container_of(head, struct nft_chain, rcu_head); > > + struct nft_ctx ctx = { > > + .family = chain->table->family, > > + .chain = chain, > > + .net = read_pnet(&chain->table->net), > > + }; > > + > > + __nft_release_basechain_now(&ctx); > > + put_net(ctx.net); > > ... and also about deprecated API usage here, the put_net_tracker() > version should be preferred. > > Given this change will likely land on very old trees I guess the tracker > conversion is better handled as a follow-up net-next patch. Agreed. > Would you mind addressing the kdoc above? Today PR will be handled by > Jakub quite later, so there is a bit of time. I will fix kdoc and resubmit.