Re: rxe: Invalid qp_num in WCE after placing qp in error state

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, 2018-10-23 at 00:51 -0700, Sagi Grimberg wrote:
> > Hi all,
> > 
> > When placing an RDMA qp into IBV_QPS_ERR via ibv_modify_qp while that qp has
> > many outstanding operations, the completion queue entries generated
> > corresponding to those operations have invalid values for both wr_id and
> > qp_num
> > (-1). This is prior to destroying the qp and only occurs for rxe (testing
> > from
> > userspace). When running on real hardware NICs the values are correct.
> > 
> > I'm on Linux 4.18.12-200.fc28.x86_64 with libibverbs 1.1.16.2.
> 
> Clearly do_complete in rxe_resp is not filling this information
> for error completions...
> 
> Does this help (warning untested):
> --
> diff --git a/drivers/infiniband/sw/rxe/rxe_resp.c 
> b/drivers/infiniband/sw/rxe/rxe_resp.c
> index aa5833318372..d7c02a83b80f 100644
> --- a/drivers/infiniband/sw/rxe/rxe_resp.c
> +++ b/drivers/infiniband/sw/rxe/rxe_resp.c
> @@ -841,11 +841,16 @@ static enum resp_states do_complete(struct rxe_qp *qp,
> 
>          memset(&cqe, 0, sizeof(cqe));
> 
> -       wc->wr_id               = wqe->wr_id;
> -       wc->status              = qp->resp.status;
> -       wc->qp                  = &qp->ibqp;
> +       if (qp->rcq->is_user) {
> +               uwc->status             = qp->resp.status;
> +               uwc->qp_num             = qp->ibqp.qp_num;
> +               uwc->wr_id              = wqe->wr_id;
> +       } else {
> +               wc->status              = qp->resp.status;
> +               wc->qp                  = &qp->ibqp;
> +               wc->wr_id               = wqe->wr_id;
> +       }
> 
> -       /* fields after status are not required for errors */
>          if (wc->status == IB_WC_SUCCESS) {
>                  wc->opcode = (pkt->mask & RXE_IMMDT_MASK &&
>                                  pkt->mask & RXE_WRITE_MASK) ?
> --


I can confirm that this patch fixes the issue. I believe vendor_err should be
valid as well, so maybe that should be set to 0.




[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux