On Thu, 2019-08-01 at 18:59 +0300, Leon Romanovsky wrote: > > There's no need for a lockdep. The removal of the notifier callback > > entry is re-entrant safe. The core removal routines have their own > > spinlock they use to protect the actual notifier list. If you call > > it > > more than once, the second and subsequent calls merely scan the > > list, > > find no matching entry, and return ENOENT. The only reason this > > might > > need a lock and a lockdep entry is if you are protecting against a > > race > > with the *add* notifier code in the mlx5 driver specifically (the > > core > > add code won't have an issue, but since you only have a single place > > to > > store the notifier callback pointer, if it would be possible for you > > to > > add two callbacks and write over the first callback pointer with the > > second without removing the first, then you would leak a callback > > notifier in the core notifier list). > > atomic_notifier_chain_unregister() unconditionally calls to > syncronize_rcu() and I'm not so sure that it is best thing to do > for every port unbind. > > Actually, I'm completely lost here, we are all agree that the patch > fixes issue correctly, and it returns the code to be exactly as > it was before commit df097a278c75 ("IB/mlx5: Use the new mlx5 core > notifier > API"). Can we simply merge it and fix the kernel panic? As long as you are OK with me adding a comment to the patch so people coming back later won't scratch their head about how can it possible be right to do that sequence without a lock held, I'm fine merging the fix. Something like: /* * The check/unregister/set-NULL sequence below does not need to be * locked for correctness as it's only an optimization, and can't * be under a lock or will throw a scheduling while atomic error. */ -- Doug Ledford <dledford@xxxxxxxxxx> GPG KeyID: B826A3330E572FDD Fingerprint = AE6B 1BDA 122B 23B4 265B 1274 B826 A333 0E57 2FDD
Attachment:
signature.asc
Description: This is a digitally signed message part