Re: [PATCH for-rc] RDMA/core: Fix corrupted SL on passive side

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




> On 1 Apr 2021, at 17:04, Jason Gunthorpe <jgg@xxxxxxxxxx> wrote:
> 
> On Wed, Mar 24, 2021 at 02:34:13PM +0000, Håkon Bugge wrote:
>> 
>> 
>>> On 23 Mar 2021, at 20:46, Jason Gunthorpe <jgg@xxxxxxxxxx> wrote:
>>> 
>>> On Mon, Mar 22, 2021 at 02:35:32PM +0100, Håkon Bugge wrote:
>>>> On RoCE systems, a CM REQ contains a Primary Hop Limit > 1 and Primary
>>>> Subnet Local is zero.
>>>> 
>>>> In cm_req_handler(), the cm_process_routed_req() function is
>>>> called. Since the Primary Subnet Local value is zero in the request,
>>>> and since this is RoCE (Primary Local LID is permissive), the
>>>> following statement will be executed:
>>>> 
>>>>     IBA_SET(CM_REQ_PRIMARY_SL, req_msg, wc->sl);
>>>> 
>>>> This corrupts SL in req_msg if it was different from zero. In other
>>>> words, a request to setup a connection using an SL != zero, will not
>>>> be honored, and a connection using SL zero will be created instead.
>>>> 
>>>> Fixed by not calling cm_process_routed_req() on RoCE systems.
>>>> 
>>>> Fixes: 3971c9f6dbf2 ("IB/cm: Add interim support for routed paths")
>>>> Signed-off-by: Håkon Bugge <haakon.bugge@xxxxxxxxxx>
>>>> drivers/infiniband/core/cm.c | 3 ++-
>>>> 1 file changed, 2 insertions(+), 1 deletion(-)
>>>> 
>>>> diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c
>>>> index 3d194bb..6adbaea 100644
>>>> +++ b/drivers/infiniband/core/cm.c
>>>> @@ -2138,7 +2138,8 @@ static int cm_req_handler(struct cm_work *work)
>>>> 		goto destroy;
>>>> 	}
>>>> 
>>>> -	cm_process_routed_req(req_msg, work->mad_recv_wc->wc);
>>>> +	if (cm_id_priv->av.ah_attr.type != RDMA_AH_ATTR_TYPE_ROCE)
>>>> +		cm_process_routed_req(req_msg, work->mad_recv_wc->wc);
>>> 
>>> why use ah_attr.type when a few lines below we have:
>>> 
>>> 	if (gid_attr &&
>>> 	    rdma_protocol_roce(work->port->cm_dev->ib_device,
>>> 			       work->port->port_num)) {
>>> 
>>> ?
>>> 
>>> I suspect you can just move this into the else?
>> 
>> I can counter that by saying ah_attr.type is used ~10 lines further
>> down in the conditional call to sa_path_set_dmac() ;-)
> 
> Hum, OK. Please send an additional patch to unify everything around
> av.ah_attr.type

Will do.

>>> 	if (gid_attr &&
>>> 	    rdma_protocol_roce(work->port->cm_dev->ib_device,
>>> 			       work->port->port_num)) {
>> 
>> I cannot really see how gid_attr could be null. If
>> ib_init_ah_attr_from_wc() succeeds, it is set after the call to
>> cm_init_av_for_response() above. May be using ah_attr.type in this
>> test instead, for uniformity and readability?
> 
> The GRH is optional, ib_init_ah_attr_from_wc() only sets it
> conditionally.

True. But one of the conditions to set sgid_attr is rdma_protocol_roce(). Hence the first term in:

if (gid_attr && rdma_protocol_roce())

is superfluous. This because, it cannot be NULL on RoCE systems, because it is dereferenced in:

cm_init_av_for_response()
    ib_init_ah_attr_from_wc()
        rdma_move_grh_sgid_attr()


I'll send the patch with the gid_attr term and let you can decide.


Thxs, Håkon





> 
> Applied to for-next
> 
> Thanks,
> Jason





[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux