October 18, 2022 4:24 PM, "Leon Romanovsky" <leon@xxxxxxxxxx> wrote: > On Sun, Oct 16, 2022 at 02:19:25AM -0400, Zhu Yanjun wrote: > >> From: Zhu Yanjun <yanjun.zhu@xxxxxxxxx> >> >> Before mlx5 ib device is registered, the function ib_device_set_netdev >> is not called to map the mlx5 ib device with the related net device. >> >> As such, when the function ib_device_get_by_netdev is called to get ib >> device, NULL is returned. >> >> Other ib devices, such as irdma, rxe and so on, the function >> ib_device_get_by_netdev can get ib device from the related net device. > > Ohh, you opened Pandora box, everything around it looks half-backed. > > mlx4 and mlx5 don't call to ib_device_set_netdev(), because they have > .get_netdev() callback. This callback is not an easy task to eliminate > and many internal attempts failed to eliminate them. > > This caused to very questionable ksmbd_rdma_capable_netdev() > implementation where ksmbd first checked internal ib_dev callback > and tried to use ib_device_get_by_netdev(). And to smc_ib, which > didn't even bother to use ib_device_get_by_netdev(). > >> Signed-off-by: Zhu Yanjun <yanjun.zhu@xxxxxxxxx> >> --- >> drivers/infiniband/hw/mlx5/main.c | 18 ++++++++++++++++++ >> 1 file changed, 18 insertions(+) >> >> diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c >> index 883d7c60143e..6899c3f73509 100644 >> --- a/drivers/infiniband/hw/mlx5/main.c >> +++ b/drivers/infiniband/hw/mlx5/main.c >> @@ -168,6 +168,7 @@ static int mlx5_netdev_event(struct notifier_block *this, >> u32 port_num = roce->native_port_num; >> struct mlx5_core_dev *mdev; >> struct mlx5_ib_dev *ibdev; >> + int ret = 0; >> >> ibdev = roce->dev; >> mdev = mlx5_ib_get_native_port_mdev(ibdev, port_num, NULL); >> @@ -183,6 +184,14 @@ static int mlx5_netdev_event(struct notifier_block *this, > > This is part of the problem, as you are setting netdev for IB > representors, and not for simple RoCE flow. There is more cumbersome > multiport flow which needs special logic too. In this event, if net device is registered, the snippet will map ib device and net device. So for simple RoCE flow or IB representors, the snippet should work. In my test environment, I just use a mlx5 device to make tests. And this commit works well. So to a simple RoCE flow, this should work. I can not get "more cumbersome multiport flow which needs special logic too" Can you explain it? >From my perspective, if the net device is registered, a net event NETDEV_REGISTER will send out. This commit will be called, so the ib device and net device should be mapped. Thanks and Regards Zhu Yanjun > > Thanks > >> if (ndev->dev.parent == mdev->device) >> roce->netdev = ndev; >> write_unlock(&roce->netdev_lock); >> + if (ndev->dev.parent == mdev->device) { >> + ret = ib_device_set_netdev(&ibdev->ib_dev, ndev, port_num); >> + if (ret) { >> + pr_warn("func: %s, error: %d\n", __func__, ret); >> + goto done; >> + } >> + } >> + >> break; >> >> case NETDEV_UNREGISTER: >> @@ -191,6 +200,15 @@ static int mlx5_netdev_event(struct notifier_block *this, >> if (roce->netdev == ndev) >> roce->netdev = NULL; >> write_unlock(&roce->netdev_lock); >> + >> + if (roce->netdev == ndev) { >> + ret = ib_device_set_netdev(&ibdev->ib_dev, NULL, port_num); >> + if (ret) { >> + pr_warn("func: %s, error: %d\n", __func__, ret); >> + goto done; >> + } >> + } >> + >> break; >> >> case NETDEV_CHANGE: >> -- >> 2.27.0