Re: [bug report]concurrent blktests nvme-rdma execution lead kernel null pointer

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Dec 06, 2021 at 11:10:52AM +0000, Bernard Metzler wrote:
> > -----Original Message-----
> > From: Leon Romanovsky <leon@xxxxxxxxxx>
> > Sent: Sunday, 5 December 2021 12:47
> > To: Bernard Metzler <BMT@xxxxxxxxxxxxxx>
> > Cc: Yi Zhang <yi.zhang@xxxxxxxxxx>; RDMA mailing list <linux-
> > rdma@xxxxxxxxxxxxxxx>
> > Subject: [EXTERNAL] Re: [bug report]concurrent blktests nvme-rdma
> > execution lead kernel null pointer
> > 
> > On Fri, Dec 03, 2021 at 11:27:22AM +0000, Bernard Metzler wrote:
> > > -----"Yi Zhang" <yi.zhang@xxxxxxxxxx> wrote: -----
> > >
> > > >To: "RDMA mailing list" <linux-rdma@xxxxxxxxxxxxxxx>
> > > >From: "Yi Zhang" <yi.zhang@xxxxxxxxxx>
> > > >Date: 12/03/2021 03:20AM
> > > >Subject: [EXTERNAL] [bug report]concurrent blktests nvme-rdma
> > > >execution lead kernel null pointer
> > > >
> > > >Hello
> > > >With the concurrent blktests nvme-rdma execution with both rdma_rxe
> > > >and siw lead kernel BUG on 5.16.0-rc3, pls help check it, thanks.
> > > >
> > >
> > > The RDMA core currently does not prevent us from assigning  both siw
> > > and rxe to the same netdev. I think this is what is happening here.
> > > This setting is of no sense, but obviously not prohibited by the RDMA
> > > infrastructure. Behavior is undefined and a kernel panic not
> > > unexpected. Shall we prevent the privileged user from doing this type
> > > of experiments?
> > >
> > > A related question: should we also explicitly refuse to add software
> > > RDMA drivers to netdevs with RDMA hardware active?
> > > This is, while stupid and resulting behavior undefined, currently
> > > possible as well.
> > 
> > In old soft-RoCE manuals, I saw a request to unload mlx4_ib/mlx5_ib
> > modules before configuring RXE. This effectively "prevented" from running
> > with "RDMA hardware active".
> > 
> Right. Same for 'siw over Chelsio T5/6' etc: first unload the iw_cxgb4
> driver, which implements the iWarp protocol, before attaching siw to
> the network interface. But shouldn't the kernel just refuse that two
> instances of the _same_ ULP (e.g., one hardware iWarp, one software
> iWARP) can be attached to the same netdev, potentially sharing IP
> address and port space?

I think that users will get different rdma-cm ids for real HW and SW devices.
The rdma_getaddrinfo() should help here.

> 
> > So I'm not surprised that it doesn't work, but why do you think that this
> > behavior is stupid? RXE/SIW can be seen as ULP and as such it is ok to run
> > many ULPs on same netdev.
> 
> Hmm, from an rdma_cm perspective, I am not sure it is supported
> that two RDMA providers can share the same device and IP address.
> Without recreating it or looking into the code, I expect Yi's
> null pointer issue is caused by this unsupported setup. If it is
> unsupported, it should be impossible to setup.

I agree with you that it is the best solution here, just because it is
good enough for RXE/SIW.

Thanks

> 
> Thanks,
> Bernard.



[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux