On Thu, 2019-08-01 at 20:33 +0300, Leon Romanovsky wrote: > On Thu, Aug 01, 2019 at 12:42:43PM -0400, Doug Ledford wrote: > > On Thu, 2019-08-01 at 19:23 +0300, Leon Romanovsky wrote: > > > On Thu, Aug 01, 2019 at 12:11:20PM -0400, Doug Ledford wrote: > > > > On Thu, 2019-08-01 at 18:59 +0300, Leon Romanovsky wrote: > > > > > > There's no need for a lockdep. The removal of the notifier > > > > > > callback > > > > > > entry is re-entrant safe. The core removal routines have > > > > > > their > > > > > > own > > > > > > spinlock they use to protect the actual notifier list. If > > > > > > you > > > > > > call > > > > > > it > > > > > > more than once, the second and subsequent calls merely scan > > > > > > the > > > > > > list, > > > > > > find no matching entry, and return ENOENT. The only reason > > > > > > this > > > > > > might > > > > > > need a lock and a lockdep entry is if you are protecting > > > > > > against > > > > > > a > > > > > > race > > > > > > with the *add* notifier code in the mlx5 driver specifically > > > > > > (the > > > > > > core > > > > > > add code won't have an issue, but since you only have a > > > > > > single > > > > > > place > > > > > > to > > > > > > store the notifier callback pointer, if it would be possible > > > > > > for > > > > > > you > > > > > > to > > > > > > add two callbacks and write over the first callback pointer > > > > > > with > > > > > > the > > > > > > second without removing the first, then you would leak a > > > > > > callback > > > > > > notifier in the core notifier list). > > > > > > > > > > atomic_notifier_chain_unregister() unconditionally calls to > > > > > syncronize_rcu() and I'm not so sure that it is best thing to > > > > > do > > > > > for every port unbind. > > > > > > > > > > Actually, I'm completely lost here, we are all agree that the > > > > > patch > > > > > fixes issue correctly, and it returns the code to be exactly > > > > > as > > > > > it was before commit df097a278c75 ("IB/mlx5: Use the new mlx5 > > > > > core > > > > > notifier > > > > > API"). Can we simply merge it and fix the kernel panic? > > > > > > > > As long as you are OK with me adding a comment to the patch so > > > > people > > > > coming back later won't scratch their head about how can it > > > > possible > > > > be > > > > right to do that sequence without a lock held, I'm fine merging > > > > the > > > > fix. > > > > > > > > Something like: > > > > > > > > /* > > > > * The check/unregister/set-NULL sequence below does not need to > > > > be > > > > * locked for correctness as it's only an optimization, and > > > > can't > > > > * be under a lock or will throw a scheduling while atomic > > > > error. > > > > */ > > > > > > I think that the best place will be in commit message for this > > > explanation, > > > but I'm fine with the comment inside code as well. > > > > > > Thanks a lot, I appreciate it. > > > > Patch (unmodified) is applied to for-rc, thanks. > > Thanks Doug, I'll prepare patch with lockdep for Jason and > will submit it to -next later on after passing verification. Perfect, thanks Leon. -- Doug Ledford <dledford@xxxxxxxxxx> GPG KeyID: B826A3330E572FDD Fingerprint = AE6B 1BDA 122B 23B4 265B 1274 B826 A333 0E57 2FDD
Attachment:
signature.asc
Description: This is a digitally signed message part