Re: [PATCH] ksmbd: fix missing RDMA-capable flag for IPoIB device in ksmbd_rdma_capable_netdev()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Oct 19, 2023 at 9:14 PM Tom Talpey <tom@xxxxxxxxxx> wrote:
>
> On 10/19/2023 6:37 PM, Namjae Jeon wrote:
> > 2023-10-19 15:01 GMT+09:00, Kangjing (Chaser) Huang <huangkangjing@xxxxxxxxx>:
> >> Thank you for the review. Also thanks a lot to Namjae for submitting
> >> this patch on my behalf.
> >>
> >>
> >> On Tue, Oct 17, 2023 at 10:07 AM Tom Talpey <tom@xxxxxxxxxx> wrote:
> >>>
> >>> On 10/15/2023 10:45 AM, Namjae Jeon wrote:
> >>>> From: Kangjing Huang <huangkangjing@xxxxxxxxx>
> >>>>
> >>>> Physical ib_device does not have an underlying net_device, thus its
> >>>> association with IPoIB net_device cannot be retrieved via
> >>>> ops.get_netdev() or ib_device_get_by_netdev(). ksmbd reads physical
> >>>> ib_device port GUID from the lower 16 bytes of the hardware addresses
> >>>> on
> >>>> IPoIB net_device and match its underlying ib_device using ib_find_gid()
> >>>>
> >>>> Signed-off-by: Kangjing Huang <huangkangjing@xxxxxxxxx>
> >>>> Acked-by: Namjae Jeon <linkinjeon@xxxxxxxxxx>
> >>>> ---
> >>>>    fs/smb/server/transport_rdma.c | 39
> >>>> +++++++++++++++++++++++++---------
> >>>>    1 file changed, 29 insertions(+), 10 deletions(-)
> >>>>
> >>>> diff --git a/fs/smb/server/transport_rdma.c
> >>>> b/fs/smb/server/transport_rdma.c
> >>>> index 3b269e1f523a..a82131f7dd83 100644
> >>>> --- a/fs/smb/server/transport_rdma.c
> >>>> +++ b/fs/smb/server/transport_rdma.c
> >>>> @@ -2140,8 +2140,7 @@ static int smb_direct_ib_client_add(struct
> >>>> ib_device *ib_dev)
> >>>>        if (ib_dev->node_type != RDMA_NODE_IB_CA)
> >>>>                smb_direct_port = SMB_DIRECT_PORT_IWARP;
> >>>>
> >>>> -     if (!ib_dev->ops.get_netdev ||
> >>>> -         !rdma_frwr_is_supported(&ib_dev->attrs))
> >>>> +     if (!rdma_frwr_is_supported(&ib_dev->attrs))
> >>>>                return 0;
> >>>>
> >>>>        smb_dev = kzalloc(sizeof(*smb_dev), GFP_KERNEL);
> >>>> @@ -2241,17 +2240,37 @@ bool ksmbd_rdma_capable_netdev(struct net_device
> >>>> *netdev)
> >>>>                for (i = 0; i < smb_dev->ib_dev->phys_port_cnt; i++) {
> >>>>                        struct net_device *ndev;
> >>>>
> >>>> -                     ndev =
> >>>> smb_dev->ib_dev->ops.get_netdev(smb_dev->ib_dev,
> >>>> -                                                            i + 1);
> >>>> -                     if (!ndev)
> >>>> -                             continue;
> >>>> +                     /* RoCE and iWRAP ib_dev is backed by a netdev */
> >>>> +                     if (smb_dev->ib_dev->ops.get_netdev) {
> >>>
> >>> The "IWRAP" is a typo, but IMO the comment is misleading. This is simply
> >>> looking up the target netdev, it's not limited to these two rdma types.
> >>> I suggest deleting the comment.
> >>>
> >>
> >> I agree with this point and will update this comment in version 2 of the
> >> patch.
> >>
> >>>> +                             ndev = smb_dev->ib_dev->ops.get_netdev(
> >>>> +                                     smb_dev->ib_dev, i + 1);
> >>>> +                             if (!ndev)
> >>>> +                                     continue;
> >>>>
> >>>> -                     if (ndev == netdev) {
> >>>> +                             if (ndev == netdev) {
> >>>> +                                     dev_put(ndev);
> >>>> +                                     rdma_capable = true;
> >>>> +                                     goto out;
> >>>> +                             }
> >>>>                                dev_put(ndev);
> >>>
> >>> Why not move this dev_put up above the if (ndev == netdev) test? It's
> >>> needed in both cases, so it's confusing to have two copies.
> >>
> >> I do agree that these two puts are very confusing. However, this code
> >> structure comes from the original ksmbd_rdma_capable_netdev() and
> >> this patch only indents it one more level and puts it in the if test for
> >> get_netdev.
> >>
> >> Also, while two puts look very weird, IMO putting it before the if
> >> statement
> >> is equally unintuitive as well since that would be testing the pointer after
> >> its
> >> reference is released. Although since no de-reference is happening here, it
> >> might be fine. Please confirm this is good coding-style-wise and I will
> >> include
> >> this change in v2 of the patch.
> > There is no need to update code that does not have problems.
>

> Well, there are now at least half a dozen stanzas of
>         dev_put
>         rdma_capable = <T|f>
>         goto out
>
> and I don't see why these can't be simplified to
>
>         goto out_capable|out_notcapable
>
> It woud be a lot clearer, at least to this reviewer. And more reliable
> to maintain.
>

I agree, but this consolidation would be impossible without a
major overhaul of the code structure in
ksmbd_rdma_capable_netdev(). There are several clean-up
calls that need to be made (read_unlock(&smb_direct_device_lock),
dev_put, ib_device_put) and the biggest problem is brought in
by the block of code that is doing a reverse lookup on ib_dev
structs right after the patched loop.

I don't know why it is necessary for such a reverse lookup, but given
that these code depends on the behaviors of ib device drivers, which
can be very erratic. I feel that removing that block could break
something with some other devices.

In conclusion, I feel addressing these issues in this function is
beyond the scope of this patch and they should be addressed
separately. I will move forward with the comment change in v2.



> >>
> >>>
> >>>> -                             rdma_capable = true;
> >>>> -                             goto out;
> >>>> +                     /* match physical ib_dev with IPoIB netdev by GUID
> >>>> */
> >>>
> >>> Add more information to this comment, perhaps:
> >>>
> >>>     /* if no exact netdev match, check for matching Infiniband GUID */
> >>>
> >>
> >> Fair point, will update this comment in v2.
> >>
> >>>> +                     } else if (netdev->type == ARPHRD_INFINIBAND) {
> >>>> +                             struct netdev_hw_addr *ha;
> >>>> +                             union ib_gid gid;
> >>>> +                             u32 port_num;
> >>>> +                             int ret;
> >>>> +
> >>>> +                             netdev_hw_addr_list_for_each(
> >>>> +                                     ha, &netdev->dev_addrs) {
> >>>> +                                     memcpy(&gid, ha->addr + 4,
> >>>> sizeof(gid));
> >>>> +                                     ret = ib_find_gid(smb_dev->ib_dev,
> >>>> &gid,
> >>>> +                                                       &port_num,
> >>>> NULL);
> >>>> +                                     if (!ret) {
> >>>> +                                             rdma_capable = true;
> >>>> +                                             goto out;
> >>>
> >>> Won't this leak the ndev? It needs a dev_put(ndev) before breaking
> >>> the loop, too, right?
> >>>
> >>
> >> This will not leak the ndev. Please look closer, this else branch matches
> >> with
> >> the if test on the existence of ops.get_netdev of the current ib_dev. And
> >> ndev
> >> is acquired only through that op. In the else branch, ndev is just not
> >> acquired.
> >> The original code was indented one more level here so the patch might look
> >> a bit confusing, but one of the if before this block is a deleted(-) line.
> > You're right.
>
> Ok, so I repeat my comment above!
>
> Tom.
>
> >
> > Please send v2 patch after adding comments that Tom pointed out.
> >
> > Thanks!
> >>
> >>>> +                                     }
> >>>> +                             }
> >>>>                        }
> >>>> -                     dev_put(ndev);
> >>>>                }
> >>>>        }
> >>>>    out:
> >>
> >>
> >>
> >> --
> >> Kangjing "Chaser" Huang
> >>
> >



--
Kangjing "Chaser" Huang




[Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux