Re: Crashes due to concurrent calls to ib_unmap_fmr()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Apr 17, 2017 at 01:45:24PM -0400, Chuck Lever wrote:
>
> > On Apr 15, 2017, at 5:55 AM, Leon Romanovsky <leon@xxxxxxxxxx> wrote:
> >
> > On Fri, Apr 14, 2017 at 11:51:39AM -0400, Chuck Lever wrote:
> >> Howdy-
> >>
> >> I recently found a way to crash my HCA (and the whole system) using a
> >> signal on an NFS/RDMA mount point that is using FMR. I've documented
> >> the issue:
> >>
> >> https://bugzilla.linux-nfs.org/show_bug.cgi?id=305
> >>
> >> And I have an NFS/RDMA fix I'm testing for v4.13. The fix is to prevent
> >> simultaneous calls to ib_unmap_fmr with the same FMR.
> >>
> >> While working on the fix, I've been looking for any documentation
> >> regarding serialization requirements for ib_unmap_fmr. Knut Omang pointed
> >> out to me that Documentation/infiniband/core-locking.txt makes this bold
> >> statement:
> >>
> >>> Reentrancy
> >>>
> >>>  All of the methods in struct ib_device exported by a low-level
> >>>  driver must be fully reentrant.  The low-level driver is required to
> >>>  perform all synchronization necessary to maintain consistency, even
> >>>  if multiple function calls using the same object are run
> >>>  simultaneously.
> >>>
> >>>  The IB midlayer does not perform any serialization of function calls.
> >>>
> >>>  Because low-level drivers are reentrant, upper level protocol
> >>>  consumers are not required to perform any serialization.
> >>
> >> Does this re-entrancy guarantee apply only when ib_unmap_fmr is called
> >> concurrently with unique FMRs?
> >
> > According to description, it should apply to all operations on ib_device
> > without any exclusion.
> >
> >>
> >> I've been told it is not possible for ib_unmap_fmr to detect when it has
> >> been invoked in different threads with the same FMR.
> >
> > Right, FMR management is implemented as direct writes to MPT and MTT
> > tables. HW doesn't distinguish simultaneous calls to the TPT cache.
> >
> >> but apparently the > user space equivalent does not have the same
> >> vulnerability (I did not test this assertion).
> >>
> >> I'm wondering what is proper closure here (aside from merging the
> >> NFS/RDMA fix).
> >
> > Maybe serialize unmap_frm (workqueue) from the driver side?
>
> Either correcting the documentation or a driver change is OK with me.
>
> Claiming that "upper level protocol consumers are not required to
> perform any serialization" seems like a stretch.

Right,

I added Jack to this thread, and we will need a couple of days to think
internally about possible solutions.

Thanks

>
>
> --
> Chuck Lever
>
>
>

Attachment: signature.asc
Description: PGP signature


[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux