On Mon, Dec 06, 2021 at 11:10:52AM +0000, Bernard Metzler wrote: > > -----Original Message----- > > From: Leon Romanovsky <leon@xxxxxxxxxx> > > Sent: Sunday, 5 December 2021 12:47 > > To: Bernard Metzler <BMT@xxxxxxxxxxxxxx> > > Cc: Yi Zhang <yi.zhang@xxxxxxxxxx>; RDMA mailing list <linux- > > rdma@xxxxxxxxxxxxxxx> > > Subject: [EXTERNAL] Re: [bug report]concurrent blktests nvme-rdma > > execution lead kernel null pointer > > > > On Fri, Dec 03, 2021 at 11:27:22AM +0000, Bernard Metzler wrote: > > > -----"Yi Zhang" <yi.zhang@xxxxxxxxxx> wrote: ----- > > > > > > >To: "RDMA mailing list" <linux-rdma@xxxxxxxxxxxxxxx> > > > >From: "Yi Zhang" <yi.zhang@xxxxxxxxxx> > > > >Date: 12/03/2021 03:20AM > > > >Subject: [EXTERNAL] [bug report]concurrent blktests nvme-rdma > > > >execution lead kernel null pointer > > > > > > > >Hello > > > >With the concurrent blktests nvme-rdma execution with both rdma_rxe > > > >and siw lead kernel BUG on 5.16.0-rc3, pls help check it, thanks. > > > > > > > > > > The RDMA core currently does not prevent us from assigning both siw > > > and rxe to the same netdev. I think this is what is happening here. > > > This setting is of no sense, but obviously not prohibited by the RDMA > > > infrastructure. Behavior is undefined and a kernel panic not > > > unexpected. Shall we prevent the privileged user from doing this type > > > of experiments? > > > > > > A related question: should we also explicitly refuse to add software > > > RDMA drivers to netdevs with RDMA hardware active? > > > This is, while stupid and resulting behavior undefined, currently > > > possible as well. > > > > In old soft-RoCE manuals, I saw a request to unload mlx4_ib/mlx5_ib > > modules before configuring RXE. This effectively "prevented" from running > > with "RDMA hardware active". > > > Right. Same for 'siw over Chelsio T5/6' etc: first unload the iw_cxgb4 > driver, which implements the iWarp protocol, before attaching siw to > the network interface. But shouldn't the kernel just refuse that two > instances of the _same_ ULP (e.g., one hardware iWarp, one software > iWARP) can be attached to the same netdev, potentially sharing IP > address and port space? I think that users will get different rdma-cm ids for real HW and SW devices. The rdma_getaddrinfo() should help here. > > > So I'm not surprised that it doesn't work, but why do you think that this > > behavior is stupid? RXE/SIW can be seen as ULP and as such it is ok to run > > many ULPs on same netdev. > > Hmm, from an rdma_cm perspective, I am not sure it is supported > that two RDMA providers can share the same device and IP address. > Without recreating it or looking into the code, I expect Yi's > null pointer issue is caused by this unsupported setup. If it is > unsupported, it should be impossible to setup. I agree with you that it is the best solution here, just because it is good enough for RXE/SIW. Thanks > > Thanks, > Bernard.