Re: Crashes due to concurrent calls to ib_unmap_fmr()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, 18 Apr 2017 20:44:30 +0300
Leon Romanovsky <leon@xxxxxxxxxx> wrote:

> On Mon, Apr 17, 2017 at 01:45:24PM -0400, Chuck Lever wrote:
> >  
> > > On Apr 15, 2017, at 5:55 AM, Leon Romanovsky <leon@xxxxxxxxxx>
> > > wrote:
> > >
> > > On Fri, Apr 14, 2017 at 11:51:39AM -0400, Chuck Lever wrote:  
> > >> Howdy-
> > >>
> > >> I recently found a way to crash my HCA (and the whole system)
> > >> using a signal on an NFS/RDMA mount point that is using FMR.
> > >> I've documented the issue:
> > >>
> > >> https://bugzilla.linux-nfs.org/show_bug.cgi?id=305
> > >>
> > >> And I have an NFS/RDMA fix I'm testing for v4.13. The fix is to
> > >> prevent simultaneous calls to ib_unmap_fmr with the same FMR.
> > >>
> > >> While working on the fix, I've been looking for any documentation
> > >> regarding serialization requirements for ib_unmap_fmr. Knut
> > >> Omang pointed out to me that
> > >> Documentation/infiniband/core-locking.txt makes this bold
> > >> statement: 
> > >>> Reentrancy
> > >>>
> > >>>  All of the methods in struct ib_device exported by a low-level
> > >>>  driver must be fully reentrant.  The low-level driver is
> > >>> required to perform all synchronization necessary to maintain
> > >>> consistency, even if multiple function calls using the same
> > >>> object are run simultaneously.
> > >>>
> > >>>  The IB midlayer does not perform any serialization of function
> > >>> calls.
> > >>>
> > >>>  Because low-level drivers are reentrant, upper level protocol
> > >>>  consumers are not required to perform any serialization.  
> > >>
> > >> Does this re-entrancy guarantee apply only when ib_unmap_fmr is
> > >> called concurrently with unique FMRs?  
> > >
> > > According to description, it should apply to all operations on
> > > ib_device without any exclusion.
> > >  
> > >>
> > >> I've been told it is not possible for ib_unmap_fmr to detect
> > >> when it has been invoked in different threads with the same
> > >> FMR.  
> > >
> > > Right, FMR management is implemented as direct writes to MPT and
> > > MTT tables. HW doesn't distinguish simultaneous calls to the TPT
> > > cache. 
> > >> but apparently the > user space equivalent does not have the same
> > >> vulnerability (I did not test this assertion).
> > >>
> > >> I'm wondering what is proper closure here (aside from merging the
> > >> NFS/RDMA fix).  
> > >
> > > Maybe serialize unmap_frm (workqueue) from the driver side?  
> >
> > Either correcting the documentation or a driver change is OK with
> > me.
> >
> > Claiming that "upper level protocol consumers are not required to
> > perform any serialization" seems like a stretch.  
> 
> Right,
> 
> I added Jack to this thread, and we will need a couple of days to
> think internally about possible solutions.
> 
> Thanks
> 
Adding Majd

-Jack
> >
> > --
> > Chuck Lever
> >
> >
> >  

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux