Re: RDME/rxe: Fast reg with local access rights and invalidation for that MR

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 6/1/22 11:15, Haris Iqbal wrote:
> On Tue, May 31, 2022 at 7:12 PM Bob Pearson <rpearsonhpe@xxxxxxxxx> wrote:
>>
>> On 5/30/22 06:05, Haris Iqbal wrote:
>>> Hi Bob,
>>>
>>> I have a query. After the following patch,
>>>
>>> https://marc.info/?l=linux-rdma&m=163163776430842&w=2
>>>
>>> If I send a IB_WR_REG_MR wr with flag set to IB_ACCESS_LOCAL_WRITE,
>>> rxe will set the mr->rkey to 0 (mr->lkey will be set to the key I send
>>> in wr).
>>>
>>> Afterwards, If I have to invalidate that mr with IB_WR_LOCAL_INV,
>>> setting the .ex.invalidate_rkey to the key I sent previously in the
>>> IB_WR_REG_MR wr, the invalidate would fail with the following error.
>>>
>>> rkey (%#x) doesn't match mr->rkey
>>> (function rxe_invalidate_mr)
>>>
>>> Is this desired behaviour? If so, how would I go about invalidating
>>> the above MR?
>>>
>>> Regards
>>> -Haris
>>
>> I think that the first behavior is correct. If you don't do this then the
>> MR is open for RDMA operations which you didn't allow.
>>
>> The second behavior is more interesting. If you are doing a send_with_invalidate
>> from a remote node then no reason you should allow the remote node to do
>> anything to the MR since it didn't have access to begin with. For a local invalidate MR
>> If you read the IBA it claims that local invalidate operations should provide
>> the lkey, rkey and memory handle as parameters to the operation and that the
>> lkey should be invalidated and the rkey if there is one should be invalidated. But
>> ib_verbs.h only has one parameter labeled rkey.
>>
>> The rxe driver follows most other providers and always makes the lkey and rkey the same
>> if there is an rkey else the rkey is set to zero. So rxe_invalidate_mr should compare
>> to the lkey and not the rkey for local invalidate. And then move the MR to the FREE state.
>>
>> This is a bug. Fortunately the majority of use cases for physical memory regions are
>> for RDMA access.
> 
> Thanks for the response Bob. I understand now.
> 
>>
>> Feel free to submit a patch or I will if you don't care to. The rxe_invalidate_mr() subroutine
>> needs have a new parameter since it is shared by local and remote invalidate operations and
>> they need to behave differently. Easiest is to have an lkey and rkey parameter. The local
>> operation would set the lkey and the remote operation the rkey.
> 
> Do you mean something like this?
> 
> diff --git a/drivers/infiniband/sw/rxe/rxe_loc.h
> b/drivers/infiniband/sw/rxe/rxe_loc.h
> index 0e022ae1b8a5..1e6a6d8d330b 100644
> --- a/drivers/infiniband/sw/rxe/rxe_loc.h
> +++ b/drivers/infiniband/sw/rxe/rxe_loc.h
> @@ -77,7 +77,7 @@ struct rxe_mr *lookup_mr(struct rxe_pd *pd, int
> access, u32 key,
>                          enum rxe_mr_lookup_type type);
>  int mr_check_range(struct rxe_mr *mr, u64 iova, size_t length);
>  int advance_dma_data(struct rxe_dma_info *dma, unsigned int length);
> -int rxe_invalidate_mr(struct rxe_qp *qp, u32 rkey);
> +int rxe_invalidate_mr(struct rxe_qp *qp, u32 lkey, u32 rkey);
>  int rxe_reg_fast_mr(struct rxe_qp *qp, struct rxe_send_wqe *wqe);
>  int rxe_mr_set_page(struct ib_mr *ibmr, u64 addr);
>  int rxe_dereg_mr(struct ib_mr *ibmr, struct ib_udata *udata);
> diff --git a/drivers/infiniband/sw/rxe/rxe_mr.c
> b/drivers/infiniband/sw/rxe/rxe_mr.c
> index fc3942e04a1f..cbdb8fa9a08e 100644
> --- a/drivers/infiniband/sw/rxe/rxe_mr.c
> +++ b/drivers/infiniband/sw/rxe/rxe_mr.c
> @@ -576,20 +576,27 @@ struct rxe_mr *lookup_mr(struct rxe_pd *pd, int
> access, u32 key,
>         return mr;
>  }
> 
> -int rxe_invalidate_mr(struct rxe_qp *qp, u32 rkey)
> +int rxe_invalidate_mr(struct rxe_qp *qp, u32 lkey, u32 rkey)
>  {
>         struct rxe_dev *rxe = to_rdev(qp->ibqp.device);
>         struct rxe_mr *mr;
>         int ret;
> 
> -       mr = rxe_pool_get_index(&rxe->mr_pool, rkey >> 8);
> +       mr = rxe_pool_get_index(&rxe->mr_pool, (lkey ? lkey : rkey) >> 8);
>         if (!mr) {
> -               pr_err("%s: No MR for rkey %#x\n", __func__, rkey);
> +               pr_err("%s: No MR for key %#x\n", __func__, (lkey ?
> lkey : rkey));
>                 ret = -EINVAL;
>                 goto err;
>         }
> 
> -       if (rkey != mr->rkey) {
> +       if (lkey && lkey != mr->lkey) {
> +               pr_err("%s: lkey (%#x) doesn't match mr->lkey (%#x)\n",
> +                       __func__, lkey, mr->lkey);
> +               ret = -EINVAL;
> +               goto err_drop_ref;
> +       }
> +
> +       if (rkey && rkey != mr->rkey) {
>                 pr_err("%s: rkey (%#x) doesn't match mr->rkey (%#x)\n",
>                         __func__, rkey, mr->rkey);
>                 ret = -EINVAL;
> diff --git a/drivers/infiniband/sw/rxe/rxe_req.c
> b/drivers/infiniband/sw/rxe/rxe_req.c
> index 9d98237389cf..478b86f59f6c 100644
> --- a/drivers/infiniband/sw/rxe/rxe_req.c
> +++ b/drivers/infiniband/sw/rxe/rxe_req.c
> @@ -550,7 +550,7 @@ static int rxe_do_local_ops(struct rxe_qp *qp,
> struct rxe_send_wqe *wqe)
>                 if (rkey_is_mw(rkey))
>                         ret = rxe_invalidate_mw(qp, rkey);
>                 else
> -                       ret = rxe_invalidate_mr(qp, rkey);
> +                       ret = rxe_invalidate_mr(qp, rkey, 0);
> 
>                 if (unlikely(ret)) {
>                         wqe->status = IB_WC_LOC_QP_OP_ERR;
> diff --git a/drivers/infiniband/sw/rxe/rxe_resp.c
> b/drivers/infiniband/sw/rxe/rxe_resp.c
> index d995ddbe23a0..48b474703aa7 100644
> --- a/drivers/infiniband/sw/rxe/rxe_resp.c
> +++ b/drivers/infiniband/sw/rxe/rxe_resp.c
> @@ -803,7 +803,7 @@ static int invalidate_rkey(struct rxe_qp *qp, u32 rkey)
>         if (rkey_is_mw(rkey))
>                 return rxe_invalidate_mw(qp, rkey);
>         else
> -               return rxe_invalidate_mr(qp, rkey);
> +               return rxe_invalidate_mr(qp, 0, rkey);
>  }
> 
> Another alternate way would be to separate the invalidate into 2
> functions. One for local and the other for remote.
> That way it will be clearer and we avoid the weird use of ternary
> operator in rxe_pool_get_index and the print afterwards.
> Something like this?
> 
> diff --git a/drivers/infiniband/sw/rxe/rxe_loc.h
> b/drivers/infiniband/sw/rxe/rxe_loc.h
> index 0e022ae1b8a5..4da57abbbc8c 100644
> --- a/drivers/infiniband/sw/rxe/rxe_loc.h
> +++ b/drivers/infiniband/sw/rxe/rxe_loc.h
> @@ -77,7 +77,8 @@ struct rxe_mr *lookup_mr(struct rxe_pd *pd, int
> access, u32 key,
>                          enum rxe_mr_lookup_type type);
>  int mr_check_range(struct rxe_mr *mr, u64 iova, size_t length);
>  int advance_dma_data(struct rxe_dma_info *dma, unsigned int length);
> -int rxe_invalidate_mr(struct rxe_qp *qp, u32 rkey);
> +int rxe_invalidate_mr_local(struct rxe_qp *qp, u32 lkey);
> +int rxe_invalidate_mr_remote(struct rxe_qp *qp, u32 rkey);
>  int rxe_reg_fast_mr(struct rxe_qp *qp, struct rxe_send_wqe *wqe);
>  int rxe_mr_set_page(struct ib_mr *ibmr, u64 addr);
>  int rxe_dereg_mr(struct ib_mr *ibmr, struct ib_udata *udata);
> diff --git a/drivers/infiniband/sw/rxe/rxe_mr.c
> b/drivers/infiniband/sw/rxe/rxe_mr.c
> index fc3942e04a1f..1c7179dd92eb 100644
> --- a/drivers/infiniband/sw/rxe/rxe_mr.c
> +++ b/drivers/infiniband/sw/rxe/rxe_mr.c
> @@ -576,41 +576,72 @@ struct rxe_mr *lookup_mr(struct rxe_pd *pd, int
> access, u32 key,
>         return mr;
>  }
> 
> -int rxe_invalidate_mr(struct rxe_qp *qp, u32 rkey)
> +static int rxe_invalidate_mr(struct rxe_mr *mr)
> +{
> +       if (atomic_read(&mr->num_mw) > 0) {
> +               pr_warn("%s: Attempt to invalidate an MR while bound to MWs\n",
> +                       __func__);
> +               return -EINVAL;
> +       }
> +
> +       if (unlikely(mr->type != IB_MR_TYPE_MEM_REG)) {
> +               pr_warn("%s: mr->type (%d) is wrong type\n", __func__,
> mr->type);
> +               return -EINVAL;
> +       }
> +
> +       mr->state = RXE_MR_STATE_FREE;
> +       return 0;
> +}
> +
> +int rxe_invalidate_mr_local(struct rxe_qp *qp, u32 lkey)
>  {
>         struct rxe_dev *rxe = to_rdev(qp->ibqp.device);
>         struct rxe_mr *mr;
>         int ret;
> 
> -       mr = rxe_pool_get_index(&rxe->mr_pool, rkey >> 8);
> +       mr = rxe_pool_get_index(&rxe->mr_pool, lkey >> 8);
>         if (!mr) {
> -               pr_err("%s: No MR for rkey %#x\n", __func__, rkey);
> +               pr_err("%s: No MR for lkey %#x\n", __func__, lkey);
>                 ret = -EINVAL;
>                 goto err;
>         }
> 
> -       if (rkey != mr->rkey) {
> -               pr_err("%s: rkey (%#x) doesn't match mr->rkey (%#x)\n",
> -                       __func__, rkey, mr->rkey);
> +       if (lkey != mr->lkey) {
> +               pr_err("%s: lkey (%#x) doesn't match mr->lkey (%#x)\n",
> +                       __func__, lkey, mr->lkey);
>                 ret = -EINVAL;
>                 goto err_drop_ref;
>         }
> 
> -       if (atomic_read(&mr->num_mw) > 0) {
> -               pr_warn("%s: Attempt to invalidate an MR while bound to MWs\n",
> -                       __func__);
> +       ret = rxe_invalidate_mr(mr);
> +
> +err_drop_ref:
> +       rxe_put(mr);
> +err:
> +       return ret;
> +}
> +
> +int rxe_invalidate_mr_remote(struct rxe_qp *qp, u32 rkey)
> +{
> +       struct rxe_dev *rxe = to_rdev(qp->ibqp.device);
> +       struct rxe_mr *mr;
> +       int ret;
> +
> +       mr = rxe_pool_get_index(&rxe->mr_pool, rkey >> 8);
> +       if (!mr) {
> +               pr_err("%s: No MR for rkey %#x\n", __func__, rkey);
>                 ret = -EINVAL;
> -               goto err_drop_ref;
> +               goto err;
>         }
> 
> -       if (unlikely(mr->type != IB_MR_TYPE_MEM_REG)) {
> -               pr_warn("%s: mr->type (%d) is wrong type\n", __func__,
> mr->type);
> +       if (rkey != mr->rkey) {
> +               pr_err("%s: rkey (%#x) doesn't match mr->rkey (%#x)\n",
> +                       __func__, rkey, mr->rkey);
>                 ret = -EINVAL;
>                 goto err_drop_ref;
>         }
> 
> -       mr->state = RXE_MR_STATE_FREE;
> -       ret = 0;
> +       ret = rxe_invalidate_mr(mr);
> 
>  err_drop_ref:
>         rxe_put(mr);
> diff --git a/drivers/infiniband/sw/rxe/rxe_req.c
> b/drivers/infiniband/sw/rxe/rxe_req.c
> index 9d98237389cf..e7dd9738a255 100644
> --- a/drivers/infiniband/sw/rxe/rxe_req.c
> +++ b/drivers/infiniband/sw/rxe/rxe_req.c
> @@ -550,7 +550,7 @@ static int rxe_do_local_ops(struct rxe_qp *qp,
> struct rxe_send_wqe *wqe)
>                 if (rkey_is_mw(rkey))
>                         ret = rxe_invalidate_mw(qp, rkey);
>                 else
> -                       ret = rxe_invalidate_mr(qp, rkey);
> +                       ret = rxe_invalidate_mr_local(qp, rkey);
> 
>                 if (unlikely(ret)) {
>                         wqe->status = IB_WC_LOC_QP_OP_ERR;
> diff --git a/drivers/infiniband/sw/rxe/rxe_resp.c
> b/drivers/infiniband/sw/rxe/rxe_resp.c
> index d995ddbe23a0..234e7858fb12 100644
> --- a/drivers/infiniband/sw/rxe/rxe_resp.c
> +++ b/drivers/infiniband/sw/rxe/rxe_resp.c
> @@ -803,7 +803,7 @@ static int invalidate_rkey(struct rxe_qp *qp, u32 rkey)
>         if (rkey_is_mw(rkey))
>                 return rxe_invalidate_mw(qp, rkey);
>         else
> -               return rxe_invalidate_mr(qp, rkey);
> +               return rxe_invalidate_mr_remote(qp, rkey);
>  }
> 
> I tested both with rnbd/rtrs and both works fine.
> Let me know which one looks better. I will send that one as a patch.

I like the second one better. May want to hold off until the merge window closes
to send it.

Zhu is the rxe maintainer so be sure to copy him and Jason.

Bob
> 
>>
>> Bob




[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux