Re: [PATCH for-rc] RDMA/rxe: Fix rnr retry behavior

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 5/13/22 08:38, Yanjun Zhu wrote:
> 
> 
> 在 2022/5/13 10:40, Yanjun Zhu 写道:
>> 在 2022/5/13 3:49, Bob Pearson 写道:
>>> Currently the completer tasklet when it sets the retransmit timer or the
>>> nak timer sets the same flag (qp->req.need_retry) so that if either
>>> timer fires it will attempt to perform a retry flow on the send queue.
>>> This has the effect of responding to an RNR NAK at the first retransmit
>>> timer event which does not allow for the requested rnr timeout.
>>>
>>> This patch adds a new flag (qp->req.need_rnr_timeout) which, if set,
>>> prevents a retry flow until the rnr nak timer fires.
>>>
>>> This patch fixes rnr retry errors which can be observed by running the
>>> pyverbs test suite 50-100X. With this patch applied they do not occur.
>>
>> Do you mean that running run_tests.py for 50-100times can reproduce this bug? I want to reproduce this problem.
> 
> Running run_tests.py for 50-100times, I can not reproduce this problem

I went back and looked. The occasionally failing test case is:

	test_rdmacm_async_traffic_external_qp

This test is normally skipped for rxe because of a hack in <path to rdma-core>/tests/base.py. I sent you
a patch that enables these tests a few days ago.

It randomly causes an rnr retry about 1/3 of the time. Most of these are repaired by the retry timer
going off. A few ~1-2% are not.

Bob
> 
> Zhu Yanjun
> 
>>
>> Thanks a lot.
>> Zhu Yanjun
>>
>>>
>>> Link: https://lore.kernel.org/linux-rdma/a8287823-1408-4273-bc22-99a0678db640@xxxxxxxxx/
>>> Fixes: 8700e3e7c485 ("Soft RoCE (RXE) - The software RoCE driver")
>>> Signed-off-by: Bob Pearson <rpearsonhpe@xxxxxxxxx>
>>> ---
>>>   drivers/infiniband/sw/rxe/rxe_comp.c  | 4 +---
>>>   drivers/infiniband/sw/rxe/rxe_qp.c    | 1 +
>>>   drivers/infiniband/sw/rxe/rxe_req.c   | 6 ++++--
>>>   drivers/infiniband/sw/rxe/rxe_verbs.h | 1 +
>>>   4 files changed, 7 insertions(+), 5 deletions(-)
>>>
>>> diff --git a/drivers/infiniband/sw/rxe/rxe_comp.c b/drivers/infiniband/sw/rxe/rxe_comp.c
>>> index 138b3e7d3a5f..bc668cb211b1 100644
>>> --- a/drivers/infiniband/sw/rxe/rxe_comp.c
>>> +++ b/drivers/infiniband/sw/rxe/rxe_comp.c
>>> @@ -733,9 +733,7 @@ int rxe_completer(void *arg)
>>>                   if (qp->comp.rnr_retry != 7)
>>>                       qp->comp.rnr_retry--;
>>> -                qp->req.need_retry = 1;
>>> -                pr_debug("qp#%d set rnr nak timer\n",
>>> -                     qp_num(qp));
>>> +                qp->req.need_rnr_timeout = 1;
>>>                   mod_timer(&qp->rnr_nak_timer,
>>>                         jiffies + rnrnak_jiffies(aeth_syn(pkt)
>>>                           & ~AETH_TYPE_MASK));
>>> diff --git a/drivers/infiniband/sw/rxe/rxe_qp.c b/drivers/infiniband/sw/rxe/rxe_qp.c
>>> index 62acf890af6c..1c962468714e 100644
>>> --- a/drivers/infiniband/sw/rxe/rxe_qp.c
>>> +++ b/drivers/infiniband/sw/rxe/rxe_qp.c
>>> @@ -513,6 +513,7 @@ static void rxe_qp_reset(struct rxe_qp *qp)
>>>       atomic_set(&qp->ssn, 0);
>>>       qp->req.opcode = -1;
>>>       qp->req.need_retry = 0;
>>> +    qp->req.need_rnr_timeout = 0;
>>>       qp->req.noack_pkts = 0;
>>>       qp->resp.msn = 0;
>>>       qp->resp.opcode = -1;
>>> diff --git a/drivers/infiniband/sw/rxe/rxe_req.c b/drivers/infiniband/sw/rxe/rxe_req.c
>>> index ae5fbc79dd5c..770ae4279f73 100644
>>> --- a/drivers/infiniband/sw/rxe/rxe_req.c
>>> +++ b/drivers/infiniband/sw/rxe/rxe_req.c
>>> @@ -103,7 +103,8 @@ void rnr_nak_timer(struct timer_list *t)
>>>   {
>>>       struct rxe_qp *qp = from_timer(qp, t, rnr_nak_timer);
>>> -    pr_debug("qp#%d rnr nak timer fired\n", qp_num(qp));
>>> +    qp->req.need_retry = 1;
>>> +    qp->req.need_rnr_timeout = 0;
>>>       rxe_run_task(&qp->req.task, 1);
>>>   }
>>> @@ -624,10 +625,11 @@ int rxe_requester(void *arg)
>>>           qp->req.need_rd_atomic = 0;
>>>           qp->req.wait_psn = 0;
>>>           qp->req.need_retry = 0;
>>> +        qp->req.need_rnr_timeout = 0;
>>>           goto exit;
>>>       }
>>> -    if (unlikely(qp->req.need_retry)) {
>>> +    if (unlikely(qp->req.need_retry && !qp->req.need_rnr_timeout)) {
>>>           req_retry(qp);
>>>           qp->req.need_retry = 0;
>>>       }
>>> diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.h b/drivers/infiniband/sw/rxe/rxe_verbs.h
>>> index e7eff1ca75e9..ab3186478c3f 100644
>>> --- a/drivers/infiniband/sw/rxe/rxe_verbs.h
>>> +++ b/drivers/infiniband/sw/rxe/rxe_verbs.h
>>> @@ -123,6 +123,7 @@ struct rxe_req_info {
>>>       int            need_rd_atomic;
>>>       int            wait_psn;
>>>       int            need_retry;
>>> +    int            need_rnr_timeout;
>>>       int            noack_pkts;
>>>       struct rxe_task        task;
>>>   };
>>>
>>> base-commit: c5eb0a61238dd6faf37f58c9ce61c9980aaffd7a
>>




[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux