Re: [PATCH] RDMA/rxe: Use kzalloc() to alloc map_set

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, May 24, 2022 at 6:00 AM lizhijian@xxxxxxxxxxx
<lizhijian@xxxxxxxxxxx> wrote:
>
> Hi Jason & Bob
> CC Guoqing
>
> @Guoqing, It may correlate with your previous bug report: https://lore.kernel.org/all/20220210073655.42281-1-guoqing.jiang@xxxxxxxxx/T/
>
>
> It's observed that a same MR in rnbd server will trigger below code
> path:
>   -> rxe_mr_init_fast()
>   |-> alloc map_set() # map_set is uninitialized
>   |...-> rxe_map_mr_sg() # build the map_set
>       |-> rxe_mr_set_page()
>   |...-> rxe_reg_fast_mr() # mr->state change to VALID from FREE that means
>                            # we can access host memory(such rxe_mr_copy)
>   |...-> rxe_invalidate_mr() # mr->state change to FREE from VALID
>   |...-> rxe_reg_fast_mr() # mr->state change to VALID from FREE,
>                            # but map_set was not built again
>   |...-> rxe_mr_copy() # kernel crash due to access wild addresses
>                        # that lookup from the map_set
>
> I draft a patch like below for it, but i wonder if it's rxe's responsibility to do such checking.
> Any comments are very welcome.
>
>
>  From e9d0bd821f07f5e049027f07b3ce9dc283624201 Mon Sep 17 00:00:00 2001
> From: Li Zhijian <lizhijian@xxxxxxxxxxx>
> Date: Tue, 24 May 2022 10:56:19 +0800
> Subject: [PATCH] RDMA/rxe: check map_set valid when handle IB_WR_REG_MR
>
> It's observed that a same MR in rnbd server will trigger below code
> path:
>   -> rxe_mr_init_fast()
>   |-> alloc map_set() # map_set is uninitialized
>   |...-> rxe_map_mr_sg() # build the map_set
>       |-> rxe_mr_set_page()
>   |...-> rxe_reg_fast_mr() # mr->state change to VALID from FREE that means
>                            # we can access host memory(such rxe_mr_copy)
>   |...-> rxe_invalidate_mr() # mr->state change to FREE from VALID
>   |...-> rxe_reg_fast_mr() # mr->state change to VALID from FREE,
>                            # but map_set was not built again
>   |...-> rxe_mr_copy() # kernel crash due to access wild addresses
>                        # that lookup from the map_set
>
> Signed-off-by: Li Zhijian <lizhijian@xxxxxxxxxxx>
> ---
>   drivers/infiniband/sw/rxe/rxe_mr.c    | 9 +++++++++
>   drivers/infiniband/sw/rxe/rxe_verbs.c | 1 +
>   drivers/infiniband/sw/rxe/rxe_verbs.h | 1 +
>   3 files changed, 11 insertions(+)
>
> diff --git a/drivers/infiniband/sw/rxe/rxe_mr.c b/drivers/infiniband/sw/rxe/rxe_mr.c
> index 787c7dadc14f..09673d559c06 100644
> --- a/drivers/infiniband/sw/rxe/rxe_mr.c
> +++ b/drivers/infiniband/sw/rxe/rxe_mr.c
> @@ -90,6 +90,7 @@ static int rxe_mr_alloc_map_set(int num_map, struct rxe_map_set **setp)
>          if (!set->map)
>                  goto err_free_set;
>
> +       set->valid = false;
>          for (i = 0; i < num_map; i++) {
>                  set->map[i] = kmalloc(sizeof(struct rxe_map), GFP_KERNEL);
>                  if (!set->map[i])
> @@ -216,6 +217,7 @@ int rxe_mr_init_user(struct rxe_pd *pd, u64 start, u64 length, u64 iova,
>          }
>
>          set = mr->cur_map_set;
> +       set->valid = true;
>          set->page_shift = PAGE_SHIFT;
>          set->page_mask = PAGE_SIZE - 1;
>
> @@ -643,6 +645,7 @@ int rxe_invalidate_mr(struct rxe_qp *qp, u32 rkey)
>          }
>
>          mr->state = RXE_MR_STATE_FREE;
> +       mr->cur_map_set->valid = mr->next_map_set->valid = false;
>          ret = 0;
>
>   err_drop_ref:
> @@ -679,12 +682,18 @@ int rxe_reg_fast_mr(struct rxe_qp *qp, struct rxe_send_wqe *wqe)
>                  return -EINVAL;
>          }
>
> +       if (!mr->next_map_set->valid) {
> +               pr_warn("%s: map set is not valid\n", __func__);
> +               return -EINVAL;
> +       }
> +
>          mr->access = access;
>          mr->lkey = (mr->lkey & ~0xff) | key;
>          mr->rkey = (access & IB_ACCESS_REMOTE) ? mr->lkey : 0;
>          mr->state = RXE_MR_STATE_VALID;
>
>          set = mr->cur_map_set;
> +       set->valid = false;
>          mr->cur_map_set = mr->next_map_set;
>          mr->cur_map_set->iova = wqe->wr.wr.reg.mr->iova;
>          mr->next_map_set = set;
> diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.c b/drivers/infiniband/sw/rxe/rxe_verbs.c
> index 58e4412b1d16..4b7ae2d1d921 100644
> --- a/drivers/infiniband/sw/rxe/rxe_verbs.c
> +++ b/drivers/infiniband/sw/rxe/rxe_verbs.c
> @@ -992,6 +992,7 @@ static int rxe_map_mr_sg(struct ib_mr *ibmr, struct scatterlist *sg,
>          set->page_shift = ilog2(ibmr->page_size);
>          set->page_mask = ibmr->page_size - 1;
>          set->offset = set->iova & set->page_mask;
> +       set->valid = true;
>
>          return n;
>   }
> diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.h b/drivers/infiniband/sw/rxe/rxe_verbs.h
> index 86068d70cd95..2edf31aab7e1 100644
> --- a/drivers/infiniband/sw/rxe/rxe_verbs.h
> +++ b/drivers/infiniband/sw/rxe/rxe_verbs.h
> @@ -289,6 +289,7 @@ struct rxe_map {
>
>   struct rxe_map_set {
>          struct rxe_map          **map;
> +       bool                    valid;
>          u64                     va;
>          u64                     iova;
>          size_t                  length;
> --
> 2.31.1

Thanks for posting the description and the patch.

We have been facing the exact same issue (only with rxe), and we also
realized that to get around this we will have to call ib_map_mr_sg()
before every IB_WR_REG_MR wr; even if we are reusing the MR and simply
invalidating and re-validating them.

In reference to this, we have 2 questions.

1) This change was made in the following commit.

commit 647bf13ce944f20f7402f281578423a952274e4a
Author: Bob Pearson <rpearsonhpe@xxxxxxxxx>
Date:   Tue Sep 14 11:42:06 2021 -0500

    RDMA/rxe: Create duplicate mapping tables for FMRs

    For fast memory regions create duplicate mapping tables so ib_map_mr_sg()
    can build a new mapping table which is then swapped into place
    synchronously with the execution of an IB_WR_REG_MR work request.

    Currently the rxe driver uses the same table for receiving RDMA operations
    and for building new tables in preparation for reusing the MR. This
    exposes users to potentially incorrect results.

    Link: https://lore.kernel.org/r/20210914164206.19768-5-rpearsonhpe@xxxxxxxxx
    Signed-off-by: Bob Pearson <rpearsonhpe@xxxxxxxxx>
    Signed-off-by: Jason Gunthorpe <jgg@xxxxxxxxxx>

We tried to understand what potential incorrect result that commit
message talks about, but were not able to. If someone can through
light into this scenario where single mapping table can result into
issues, it would be great.

2)
We wanted to confirm that, with the above patch, it clearly means that
the use-case where we reuse the MR, by simply invalidating and
re-validating (IB_WR_REG_MR wr) is a correct one.
And there is no requirement saying that ib_map_mr_sg() needs to be
done everytime regardless.

(We were planning to send this in the coming days, but wanted other
discussions to get over. Since the patch got posted and discussion
started, we thought better to put this out.)

Regards

>
>
> On 23/05/2022 22:02, Li, Zhijian wrote:
> >
> > on 2022/5/20 22:45, Jason Gunthorpe wrote:
> >> On Wed, May 18, 2022 at 12:37:25PM +0800, Li Zhijian wrote:
> >>> Below call chains will alloc map_set without fully initializing map_set.
> >>> rxe_mr_init_fast()
> >>>   -> rxe_mr_alloc()
> >>>      -> rxe_mr_alloc_map_set()
> >>>
> >>> Uninitialized values inside struct rxe_map_set are possible to cause
> >>> kernel panic.
> >> If the value is uninitialized then why is 0 an OK value?
> >>
> >> Would be happier to know the exact value that is not initialized
> >
> > Well, good question. After re-think of this issue, it seems this patch wasn't the root cause though it made the crash disappear in some extent.
> >
> > I'm still working on the root cause :)
> >
> > Thanks
> >
> > Zhijian
> >
> >
> >>
> >> Jason
> >
> >



[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux