On 27/06/2022 05:51, Bob Pearson wrote: > On 5/15/22 20:53, Li Zhijian wrote: >> Previously, if user space keeps sending abnormal wqe, queue.prod will >> keep increasing while queue.index doesn't. Once >> queue.index==queue.prod in next round, req_next_wqe() will treat queue >> as empty. In such case, no new completion would be generated. >> >> Update wqe_index for each wqe completion so that req_next_wqe() can get >> next wqe properly. >> >> Signed-off-by: Li Zhijian <lizhijian@xxxxxxxxxxx> >> --- >> drivers/infiniband/sw/rxe/rxe_req.c | 2 ++ >> 1 file changed, 2 insertions(+) >> >> diff --git a/drivers/infiniband/sw/rxe/rxe_req.c b/drivers/infiniband/sw/rxe/rxe_req.c >> index a0d5e57f73c1..8bdd0b6b578f 100644 >> --- a/drivers/infiniband/sw/rxe/rxe_req.c >> +++ b/drivers/infiniband/sw/rxe/rxe_req.c >> @@ -773,6 +773,8 @@ int rxe_requester(void *arg) >> if (ah) >> rxe_put(ah); >> err: >> + /* update wqe_index for each wqe completion */ >> + qp->req.wqe_index = queue_next_index(qp->sq.queue, qp->req.wqe_index); >> wqe->state = wqe_state_err >> __rxe_do_task(&qp->comp.task); >> > This change looks plausible, but I am not sure if it will make a difference since the qp > will get transitioned to the error state very shortly. > > In order for it to matter the requester must be a ways ahead of the completer in the send queue > and someone be actively posting new wqes which will reschedule the requester. Currently it > will fail on the same wqe again unless the error described above occurs but if we post a new valid > wqe it will get executed even though we have detected an error that should have stopped the qp. > > It looks like the intent was to keep the qp in the non error state until all the old > wqes get completed before making the transition. Not really, My first intent was just let req_next_wqe() return wqe if the queue is not empty. Since, currently if rxe_requester() always goes to the error path for some reasons, req_next_wqe() will becomes false empty at next round though the queue is almost full. BTW, i will review your newly private patches Thanks Zhijian > But we should disable the requester > from processing new wqes in this case. That seems like a safer solution to the problem. > > Bob >