Re: bug report for rxe

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Feb 22, 2022 at 5:50 PM Guoqing Jiang <guoqing.jiang@xxxxxxxxx> wrote:
>
>
>
> On 2/10/22 3:36 PM, Guoqing Jiang wrote:
> > However, seems rnbd/rtrs over rxe still can't work with 5.17-rc3 kernel,
> > dmesg reports below.
> >
> > 1. server side
> >
> > [  440.723182] rdma_rxe: qp#17 moved to error state
> > [  440.725300] rtrs_server L1205: <bla>: remote access error (wr_cqe: 000000003b14397c, type: 0, vendor_err: 0x0, len: 0)
> > [  440.845926] rnbd_server L256: RTRS Session bla disconnected
> >
> > 2. client side
> >
> > [  997.817536] rnbd_client L596: Mapping device /dev/loop1 on session bla, (access_mode: rw, nr_poll_queues: 0)
> > [  998.968810] rnbd_client L1213: [session=bla] mapped 8/8 default/read queues.
> > [  999.017988] rtrs_client L610: <bla>: RDMA failed: remote access error
> > [ 1029.836943] rtrs_client L353: <bla>: Failed IB_WR_LOCAL_INV: WR flushe
> >
> > Then I tried 5.16 and 5.15 version, seems 5.15 does work as follows.
> >
> > 1. server side
> >
> > [  333.076482] rnbd_server L800: </dev/loop1@bla>: Opened device 'loop1'
> >
> > 2. client side
> >
> > [ 1584.325825] rnbd_client L596: Mapping device /dev/loop1 on session bla, (access_mode: rw, nr_poll_queues: 0)
> > [ 1585.268291] rnbd_client L1213: [session=bla] mapped 8/8 default/read queues.
> > [ 1585.349300] rnbd_client L1607: </dev/loop1@bla> map_device: Device mapped as rnbd0 (nsectors: 0, logical_block_size: 512, physical_block_size: 512, max_write_same_sectors: 0, max_discard_sectors: 0, discard_granularity: 0, discard_alignment: 0, secure_discard: 0, max_segments: 128, max_hw_sectors: 248, rotational: 1, wc: 0, fua: 0)
> >
> > I would appreciate if someone shed light on why it doesn't work after 5.15,
> > And I am happy to test potential patch for it.
>
> After investigation, seems the culprit is commit 647bf13ce944 ("RDMA/rxe:
> Create duplicate mapping tables for FMRs"). The problem is mr_check_range
> returns -EFAULT after find iova and length are not valid, so connection
> between
> two VMs can't be established.
>
> Revert the commit manually or apply below temporary change,  rxe works again
> with rnbd/rtrs though I don't think it is the right thing to do. Could
> experts provide
> a proper solution? Thanks.
>
> diff --git a/drivers/infiniband/sw/rxe/rxe_mr.c
> b/drivers/infiniband/sw/rxe/rxe_mr.c
> index 453ef3c9d535..4a2fc4d5809d 100644
> --- a/drivers/infiniband/sw/rxe/rxe_mr.c
> +++ b/drivers/infiniband/sw/rxe/rxe_mr.c
> @@ -652,7 +652,7 @@ int rxe_reg_fast_mr(struct rxe_qp *qp, struct
> rxe_send_wqe *wqe)
>          mr->state = RXE_MR_STATE_VALID;
>
>          set = mr->cur_map_set;
> -       mr->cur_map_set = mr->next_map_set;
> +       //mr->cur_map_set = mr->next_map_set;
>          mr->cur_map_set->iova = wqe->wr.wr.reg.mr->iova;
>          mr->next_map_set = set;
>
> @@ -662,7 +662,7 @@ int rxe_reg_fast_mr(struct rxe_qp *qp, struct
> rxe_send_wqe *wqe)
>   int rxe_mr_set_page(struct ib_mr *ibmr, u64 addr)
>   {
>          struct rxe_mr *mr = to_rmr(ibmr);
> -       struct rxe_map_set *set = mr->next_map_set;
> +       struct rxe_map_set *set = mr->cur_map_set;
>          struct rxe_map *map;
>          struct rxe_phys_buf *buf;
>
> diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.c
> b/drivers/infiniband/sw/rxe/rxe_verbs.c
> index 80df9a8f71a1..e41d2c8612d8 100644
> --- a/drivers/infiniband/sw/rxe/rxe_verbs.c
> +++ b/drivers/infiniband/sw/rxe/rxe_verbs.c
> @@ -992,7 +992,7 @@ static int rxe_map_mr_sg(struct ib_mr *ibmr, struct
> scatterlist *sg,
>                           int sg_nents, unsigned int *sg_offset)
>   {
>          struct rxe_mr *mr = to_rmr(ibmr);
> -       struct rxe_map_set *set = mr->next_map_set;
> +       struct rxe_map_set *set = mr->cur_map_set;

Thanks a lot. Please file a patch for the above changes.

Zhu Yanjun

>
> And the test is pretty simple.
>
> 1.  VM (server)
>
> modprobe rdma_rxe
> rdma link add rxe0 type rxe netdev ens3
> modprobe rnbd-server
>
> 2.  VM (client)
>
> modprobe rdma_rxe
> rdma link add rxe0 type rxe netdev ens3
> modprobe rnbd-client
> echo "sessname=bla path=ip:$serverip
> device_path=$block_device_in_server" >
> /sys/devices/virtual/rnbd-client/ctl/map_device
>
> BTW, I tried wip/jgg-for-next branch with commit 3810c1a1cbe8f.
>
> Guoqing




[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux