Re: [PATCH for-next] RDMA/siw: Fix duplicated reported IW_CM_EVENT_CONNECT_REPLY event

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 7/14/22 8:59 PM, Bernard Metzler wrote:
>> -----Original Message-----
>> From: Cheng Xu <chengyou@xxxxxxxxxxxxxxxxx>
>> Sent: Thursday, 14 July 2022 03:31
>> To: jgg@xxxxxxxx; leon@xxxxxxxxxx; Bernard Metzler <BMT@xxxxxxxxxxxxxx>
>> Cc: linux-rdma@xxxxxxxxxxxxxxx; chengyou@xxxxxxxxxxxxxxxxx
>> Subject: [EXTERNAL] [PATCH for-next] RDMA/siw: Fix duplicated reported
>> IW_CM_EVENT_CONNECT_REPLY event
>>
>> If siw_recv_mpa_rr returns -EAGAIN, it means that the MPA reply hasn't
>> been received completely, and should not report IW_CM_EVENT_CONNECT_REPLY
>> in this case. This may trigger a call trace in iw_cm. A simple way to
>> trigger this:
> 
> Great, thanks! I obviously did never hit an incomplete
> MPA hdr. Please make another change to fix it correctly,
> as suggested below.
> 
> 
> case of an incomplete 
>>  server: ib_send_lat
>>  client: ib_send_lat -R <server_ip>
>>
>> The call trace looks like this:
>>
>>  kernel BUG at drivers/infiniband/core/iwcm.c:894!
>>  invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
>>  <...>
>>  Workqueue: iw_cm_wq cm_work_handler [iw_cm]
>>  Call Trace:
>>   <TASK>
>>   cm_work_handler+0x1dd/0x370 [iw_cm]
>>   process_one_work+0x1e2/0x3b0
>>   worker_thread+0x49/0x2e0
>>   ? rescuer_thread+0x370/0x370
>>   kthread+0xe5/0x110
>>   ? kthread_complete_and_exit+0x20/0x20
>>   ret_from_fork+0x1f/0x30
>>   </TASK>
>>
>> Signed-off-by: Cheng Xu <chengyou@xxxxxxxxxxxxxxxxx>
>> ---
>>  drivers/infiniband/sw/siw/siw_cm.c | 7 ++++---
>>  1 file changed, 4 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/infiniband/sw/siw/siw_cm.c
>> b/drivers/infiniband/sw/siw/siw_cm.c
>> index 17f34d584cd9..f88d2971c2c6 100644
>> --- a/drivers/infiniband/sw/siw/siw_cm.c
>> +++ b/drivers/infiniband/sw/siw/siw_cm.c
>> @@ -725,11 +725,11 @@ static int siw_proc_mpareply(struct siw_cep *cep)
>>  	enum mpa_v2_ctrl mpa_p2p_mode = MPA_V2_RDMA_NO_RTR;
>>
>>  	rv = siw_recv_mpa_rr(cep);
>> -	if (rv != -EAGAIN)
>> -		siw_cancel_mpatimer(cep);
>>  	if (rv)
>>  		goto out_err;
>>
>> +	siw_cancel_mpatimer(cep);
>> +
> 
> Cancel the MPA timer only if we have a
> real error. -EAGAIN translates to just
> further waiting. So best to add the timer
> cancellation to the error bailout section.
> 
>>  	rep = &cep->mpa.hdr;
>>
>>  	if (__mpa_rr_revision(rep->params.bits) > MPA_REVISION_2) {
>> @@ -895,7 +895,8 @@ static int siw_proc_mpareply(struct siw_cep *cep)
>>  	}
>>
>>  out_err:
>> -	siw_cm_upcall(cep, IW_CM_EVENT_CONNECT_REPLY, -EINVAL);
>> +	if (rv != -EAGAIN)
> {
> cancel MPA timer here.

Indeed we do not need it here, because when siw_proc_mpareply returns error
but not -EAGAIN, the release_cep will be set in the caller (siw_cm_work_handler),
and siw_cancel_mpatimer will be called in the error handle flow.

I think this is better, because the error handle is more unified.

How do you think?

Thanks,
Cheng Xu


> 		siw_cancel_mpatimer(cep);
>> +		siw_cm_upcall(cep, IW_CM_EVENT_CONNECT_REPLY, -EINVAL);
> }
>>
>>  	return rv;
>>  }
>> --
>> 2.37.0
> 



[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux