On 05.11.24 14:39, Leon Romanovsky wrote:
On Tue, Nov 05, 2024 at 01:30:24PM +0100, Wenjia Zhang wrote:
On 05.11.24 12:23, Leon Romanovsky wrote:
On Tue, Nov 05, 2024 at 10:50:45AM +0100, Wenjia Zhang wrote:
On 27.10.24 21:18, Leon Romanovsky wrote:
On Fri, Oct 25, 2024 at 09:23:55AM +0200, Wenjia Zhang wrote:
Commit c2261dd76b54 ("RDMA/device: Add ib_device_set_netdev() as an
alternative to get_netdev") introduced an API ib_device_get_netdev.
The SMC-R variant of the SMC protocol continued to use the old API
ib_device_ops.get_netdev() to lookup netdev.
I would say that calls to ibdev ops from ULPs was never been right
thing to do. The ib_device_set_netdev() was introduced for the drivers.
So the whole commit message is not accurate and better to be rewritten.
As this commit 8d159eb2117b
("RDMA/mlx5: Use IB set_netdev and get_netdev functions") removed the
get_netdev callback from mlx5_ib_dev_common_roce_ops, calling
ib_device_ops.get_netdev didn't work any more at least by using a mlx5
device driver.
It is not a correct statement too. All modern drivers (for last 5 years)
don't have that .get_netdev() ops, so it is not mlx5 specific, but another
justification to say that SMC-R was doing it wrong.
Thus, using ib_device_set_netdev() now became mandatory.
ib_device_set_netdev() is mandatory for the drivers, it is nothing to do
with ULPs.
Replace ib_device_ops.get_netdev() with ib_device_get_netdev().
It is too late for me to do proper review for today, but I would say
that it is worth to pay attention to multiple dev_put() calls in the
functions around the ib_device_get_netdev().
Fixes: 54903572c23c ("net/smc: allow pnetid-less configuration")
Fixes: 8d159eb2117b ("RDMA/mlx5: Use IB set_netdev and get_netdev functions")
It is not related to this change Fixes line.
Hi Leon,
Thank you for the review! I agree that SMC could do better. However, we
should fix it and give enough information and reference on the changes,
since the code has already existed and didn't work with the old way.
The code which you change worked by chance and was wrong from day one.
I can rewrite the commit message.
What about:
"
The SMC-R variant of the SMC protocol still called
ib_device_ops.get_netdev() to lookup netdev. As we used mlx5 device driver
to run SMC-R, it failed to find a device, because in mlx5_ib the internal
net device management for retrieving net devices was replaced by a common
interface ib_device_get_netdev() in commit 8d159eb2117b ("RDMA/mlx5: Use IB
set_netdev and get_netdev functions"). Thus, replace
ib_device_ops.get_netdev() with ib_device_get_netdev() in SMC.
"
The SMC-R variant of the SMC protocol used direct call to ib_device_ops.get_netdev()
function to lookup netdev. Such direct accesses are not correct for any
usage outside of RDMA core code.
Is such an absolute statement documented somewhere? If not, I don't think
it's convenient that I use it. Maybe you guys as RDMA core maintainer can,
not I.
You can too as it is very clear. All functions which can be used have
EXPORT_SYMBOL near them, ops.get_netdev() has nothing like that.
RDMA subsystem provides ib_device_get_netdev() function that works on
all RDMA drivers returns valid netdev with proper locking an reference
counting. The commit 8d159eb2117b ("RDMA/mlx5: Use IB set_netdev and get_netdev
functions") exposed that SMC-R didn't use that function.
So update the SMC-R to use proper API,
Thanks
mhhh, I'd like to stick to my version, which sounds more neutral IMO. I
think the purpose is the same.
I don't want to argue about the words, my point is that get_netdev() was
never been the right interface.
Thanks
Ok, I got you. I'll send v2 with a proper commit message.
Thanks,
Wenjia