Re: [PATCH rdma-rc] RDMA/mlx5: Release locks during notifier unregister

Doug Ledford <dledford@xxxxxxxxxx> · Thu, 01 Aug 2019 16:09:27 -0400

On Thu, 2019-08-01 at 20:33 +0300, Leon Romanovsky wrote:
> On Thu, Aug 01, 2019 at 12:42:43PM -0400, Doug Ledford wrote:
> > On Thu, 2019-08-01 at 19:23 +0300, Leon Romanovsky wrote:
> > > On Thu, Aug 01, 2019 at 12:11:20PM -0400, Doug Ledford wrote:
> > > > On Thu, 2019-08-01 at 18:59 +0300, Leon Romanovsky wrote:
> > > > > > There's no need for a lockdep.  The removal of the notifier
> > > > > > callback
> > > > > > entry is re-entrant safe.  The core removal routines have
> > > > > > their
> > > > > > own
> > > > > > spinlock they use to protect the actual notifier list.  If
> > > > > > you
> > > > > > call
> > > > > > it
> > > > > > more than once, the second and subsequent calls merely scan
> > > > > > the
> > > > > > list,
> > > > > > find no matching entry, and return ENOENT.  The only reason
> > > > > > this
> > > > > > might
> > > > > > need a lock and a lockdep entry is if you are protecting
> > > > > > against
> > > > > > a
> > > > > > race
> > > > > > with the *add* notifier code in the mlx5 driver specifically
> > > > > > (the
> > > > > > core
> > > > > > add code won't have an issue, but since you only have a
> > > > > > single
> > > > > > place
> > > > > > to
> > > > > > store the notifier callback pointer, if it would be possible
> > > > > > for
> > > > > > you
> > > > > > to
> > > > > > add two callbacks and write over the first callback pointer
> > > > > > with
> > > > > > the
> > > > > > second without removing the first, then you would leak a
> > > > > > callback
> > > > > > notifier in the core notifier list).
> > > > > 
> > > > > atomic_notifier_chain_unregister() unconditionally calls to
> > > > > syncronize_rcu() and I'm not so sure that it is best thing to
> > > > > do
> > > > > for every port unbind.
> > > > > 
> > > > > Actually, I'm completely lost here, we are all agree that the
> > > > > patch
> > > > > fixes issue correctly, and it returns the code to be exactly
> > > > > as
> > > > > it was before commit df097a278c75 ("IB/mlx5: Use the new mlx5
> > > > > core
> > > > > notifier
> > > > > API"). Can we simply merge it and fix the kernel panic?
> > > > 
> > > > As long as you are OK with me adding a comment to the patch so
> > > > people
> > > > coming back later won't scratch their head about how can it
> > > > possible
> > > > be
> > > > right to do that sequence without a lock held, I'm fine merging
> > > > the
> > > > fix.
> > > > 
> > > > Something like:
> > > > 
> > > > /*
> > > >  * The check/unregister/set-NULL sequence below does not need to
> > > > be
> > > >  * locked for correctness as it's only an optimization, and
> > > > can't
> > > >  * be under a lock or will throw a scheduling while atomic
> > > > error.
> > > >  */
> > > 
> > > I think that the best place will be in commit message for this
> > > explanation,
> > > but I'm fine with the comment inside code as well.
> > > 
> > > Thanks a lot, I appreciate it.
> > 
> > Patch (unmodified) is applied to for-rc, thanks.
> 
> Thanks Doug, I'll prepare patch with lockdep for Jason and
> will submit it to -next later on after passing verification.

Perfect, thanks Leon.

-- 
Doug Ledford <dledford@xxxxxxxxxx>
    GPG KeyID: B826A3330E572FDD
    Fingerprint = AE6B 1BDA 122B 23B4 265B  1274 B826 A333 0E57 2FDD
Attachment:
signature.asc

Description: This is a digitally signed message part