RE: [bug report]concurrent blktests nvme-rdma execution lead kernel null pointer

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



> -----Original Message-----
> From: Leon Romanovsky <leon@xxxxxxxxxx>
> Sent: Sunday, 5 December 2021 12:47
> To: Bernard Metzler <BMT@xxxxxxxxxxxxxx>
> Cc: Yi Zhang <yi.zhang@xxxxxxxxxx>; RDMA mailing list <linux-
> rdma@xxxxxxxxxxxxxxx>
> Subject: [EXTERNAL] Re: [bug report]concurrent blktests nvme-rdma
> execution lead kernel null pointer
> 
> On Fri, Dec 03, 2021 at 11:27:22AM +0000, Bernard Metzler wrote:
> > -----"Yi Zhang" <yi.zhang@xxxxxxxxxx> wrote: -----
> >
> > >To: "RDMA mailing list" <linux-rdma@xxxxxxxxxxxxxxx>
> > >From: "Yi Zhang" <yi.zhang@xxxxxxxxxx>
> > >Date: 12/03/2021 03:20AM
> > >Subject: [EXTERNAL] [bug report]concurrent blktests nvme-rdma
> > >execution lead kernel null pointer
> > >
> > >Hello
> > >With the concurrent blktests nvme-rdma execution with both rdma_rxe
> > >and siw lead kernel BUG on 5.16.0-rc3, pls help check it, thanks.
> > >
> >
> > The RDMA core currently does not prevent us from assigning  both siw
> > and rxe to the same netdev. I think this is what is happening here.
> > This setting is of no sense, but obviously not prohibited by the RDMA
> > infrastructure. Behavior is undefined and a kernel panic not
> > unexpected. Shall we prevent the privileged user from doing this type
> > of experiments?
> >
> > A related question: should we also explicitly refuse to add software
> > RDMA drivers to netdevs with RDMA hardware active?
> > This is, while stupid and resulting behavior undefined, currently
> > possible as well.
> 
> In old soft-RoCE manuals, I saw a request to unload mlx4_ib/mlx5_ib
> modules before configuring RXE. This effectively "prevented" from running
> with "RDMA hardware active".
> 
Right. Same for 'siw over Chelsio T5/6' etc: first unload the iw_cxgb4
driver, which implements the iWarp protocol, before attaching siw to
the network interface. But shouldn't the kernel just refuse that two
instances of the _same_ ULP (e.g., one hardware iWarp, one software
iWARP) can be attached to the same netdev, potentially sharing IP
address and port space?

> So I'm not surprised that it doesn't work, but why do you think that this
> behavior is stupid? RXE/SIW can be seen as ULP and as such it is ok to run
> many ULPs on same netdev.

Hmm, from an rdma_cm perspective, I am not sure it is supported
that two RDMA providers can share the same device and IP address.
Without recreating it or looking into the code, I expect Yi's
null pointer issue is caused by this unsupported setup. If it is
unsupported, it should be impossible to setup.

Thanks,
Bernard.




[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux