On Fri, 25 Mar 2022 18:01:30 +0100 Johannes Berg wrote: > That's not a bad idea, but I think I wouldn't want to backport that, so > separately :) I don't think that fundamentally changes the locking > properties though. > > > Couple of more questions I guess: First, are we assuming that the > cfg80211 code *is* actually broken, even if it looks like nothing can > cause the situation, due to the empty todo list? Right. > Given that we have rtnl_lock_unregistering() (and also > rtnl_lock_unregistering_all()), it looks like we *do* in fact at least > not want to make an assumption that no user of __rtnl_unlock() can have > added a todo item. > > I mean, there's technically yet *another* thing we could do - something > like this: > > [this doesn't compile, need to suitably make net_todo_list non-static] > --- a/net/core/rtnetlink.c > +++ b/net/core/rtnetlink.c > @@ -95,6 +95,7 @@ void __rtnl_unlock(void) > > defer_kfree_skb_list = NULL; > > + WARN_ON(!list_empty(&net_todo_list)); > mutex_unlock(&rtnl_mutex); > > while (head) { Yeah, I think we could do that. > and actually that would allow us to get rid of rtnl_lock_unregistering() > and rtnl_lock_unregistering_all() simply because we'd actually guarantee > the invariant that when the RTNL is freshly locked, the list is empty > (by guaranteeing that it's always empty when it's unlocked, since it can > only be added to under RTNL). TBH I don't know what you mean by rtnl_lock_unregistering(), I don't have that in my tree. rtnl_lock_unregistering_all() can't hurt the case we're talking about AFAICT. Eric removed some of the netns / loopback dependencies in net-next, make sure you pull! > With some suitable commentary, that might also be a reasonable thing? > __rtnl_unlock() is actually rather pretty rare, and not exported. The main use for it seems to be re-locking before loading a module, which TBH I have no idea why, is it just a cargo cult or a historical thing :S I don't see how letting netdevs leave before _loading_ a module makes any difference whatsoever. > However, if you don't like that ... > > I've been testing with this patch, to make lockdep complain: > > --- a/net/core/dev.c > +++ b/net/core/dev.c > @@ -9933,6 +9933,11 @@ void netdev_run_todo(void) > if (!list_empty(&list)) > rcu_barrier(); > > +#ifdef CONFIG_LOCKDEP > + rtnl_lock(); > + __rtnl_unlock(); > +#endif > + > while (!list_empty(&list)) { > struct net_device *dev > = list_first_entry(&list, struct net_device, todo_list); > > > That causes lockdep to complain for cfg80211 even if the list *is* in > fact empty. > > Would you be open to adding something like that? Perhaps if I don't just > do the easy rtnl_lock/unlock, but try to find the corresponding lockdep- > only things to do there, to cause lockdep to do things without really > locking? OTOH, the locking overhead of the RTNL we just unlocked is > probably minimal, vs. the actual work *lockdep* is doing to track all > this ... The WARN_ON() you suggested up front make perfect sense to me. You can also take the definition of net_unlink_todo() out of netdevice.h while at it because o_0