October 18, 2022 4:24 PM, "Leon Romanovsky" <leon@xxxxxxxxxx> wrote: > On Sun, Oct 16, 2022 at 02:19:25AM -0400, Zhu Yanjun wrote: > >> From: Zhu Yanjun <yanjun.zhu@xxxxxxxxx> >> >> Before mlx5 ib device is registered, the function ib_device_set_netdev >> is not called to map the mlx5 ib device with the related net device. >> >> As such, when the function ib_device_get_by_netdev is called to get ib >> device, NULL is returned. >> >> Other ib devices, such as irdma, rxe and so on, the function >> ib_device_get_by_netdev can get ib device from the related net device. > > Ohh, you opened Pandora box, everything around it looks half-backed. > > mlx4 and mlx5 don't call to ib_device_set_netdev(), because they have > .get_netdev() callback. This callback is not an easy task to eliminate > and many internal attempts failed to eliminate them. > > This caused to very questionable ksmbd_rdma_capable_netdev() > implementation where ksmbd first checked internal ib_dev callback > and tried to use ib_device_get_by_netdev(). And to smc_ib, which > didn't even bother to use ib_device_get_by_netdev(). Thanks. I read the function ksmbd_rdma_capable_netdev carefully. Mlx5 and mlx4 do not call ib_device_set_netdev to map net device and ib devices. This brings a lot of problems. When ib devices are needed from net devices, the callers will handle mlx5/4 and other ib devices differently. Zhu Yanjun > >> Signed-off-by: Zhu Yanjun <yanjun.zhu@xxxxxxxxx> >> --- >> drivers/infiniband/hw/mlx5/main.c | 18 ++++++++++++++++++ >> 1 file changed, 18 insertions(+) >> >> diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c >> index 883d7c60143e..6899c3f73509 100644 >> --- a/drivers/infiniband/hw/mlx5/main.c >> +++ b/drivers/infiniband/hw/mlx5/main.c >> @@ -168,6 +168,7 @@ static int mlx5_netdev_event(struct notifier_block *this, >> u32 port_num = roce->native_port_num; >> struct mlx5_core_dev *mdev; >> struct mlx5_ib_dev *ibdev; >> + int ret = 0; >> >> ibdev = roce->dev; >> mdev = mlx5_ib_get_native_port_mdev(ibdev, port_num, NULL); >> @@ -183,6 +184,14 @@ static int mlx5_netdev_event(struct notifier_block *this, > > This is part of the problem, as you are setting netdev for IB > representors, and not for simple RoCE flow. There is more cumbersome > multiport flow which needs special logic too. > > Thanks > >> if (ndev->dev.parent == mdev->device) >> roce->netdev = ndev; >> write_unlock(&roce->netdev_lock); >> + if (ndev->dev.parent == mdev->device) { >> + ret = ib_device_set_netdev(&ibdev->ib_dev, ndev, port_num); >> + if (ret) { >> + pr_warn("func: %s, error: %d\n", __func__, ret); >> + goto done; >> + } >> + } >> + >> break; >> >> case NETDEV_UNREGISTER: >> @@ -191,6 +200,15 @@ static int mlx5_netdev_event(struct notifier_block *this, >> if (roce->netdev == ndev) >> roce->netdev = NULL; >> write_unlock(&roce->netdev_lock); >> + >> + if (roce->netdev == ndev) { >> + ret = ib_device_set_netdev(&ibdev->ib_dev, NULL, port_num); >> + if (ret) { >> + pr_warn("func: %s, error: %d\n", __func__, ret); >> + goto done; >> + } >> + } >> + >> break; >> >> case NETDEV_CHANGE: >> -- >> 2.27.0