Re: bug report for rxe

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On 2/10/22 3:36 PM, Guoqing Jiang wrote:
However, seems rnbd/rtrs over rxe still can't work with 5.17-rc3 kernel,
dmesg reports below.

1. server side

[  440.723182] rdma_rxe: qp#17 moved to error state
[  440.725300] rtrs_server L1205: <bla>: remote access error (wr_cqe: 000000003b14397c, type: 0, vendor_err: 0x0, len: 0)
[  440.845926] rnbd_server L256: RTRS Session bla disconnected

2. client side

[  997.817536] rnbd_client L596: Mapping device /dev/loop1 on session bla, (access_mode: rw, nr_poll_queues: 0)
[  998.968810] rnbd_client L1213: [session=bla] mapped 8/8 default/read queues.
[  999.017988] rtrs_client L610: <bla>: RDMA failed: remote access error
[ 1029.836943] rtrs_client L353: <bla>: Failed IB_WR_LOCAL_INV: WR flushe

Then I tried 5.16 and 5.15 version, seems 5.15 does work as follows.

1. server side

[  333.076482] rnbd_server L800: </dev/loop1@bla>: Opened device 'loop1'

2. client side

[ 1584.325825] rnbd_client L596: Mapping device /dev/loop1 on session bla, (access_mode: rw, nr_poll_queues: 0)
[ 1585.268291] rnbd_client L1213: [session=bla] mapped 8/8 default/read queues.
[ 1585.349300] rnbd_client L1607: </dev/loop1@bla> map_device: Device mapped as rnbd0 (nsectors: 0, logical_block_size: 512, physical_block_size: 512, max_write_same_sectors: 0, max_discard_sectors: 0, discard_granularity: 0, discard_alignment: 0, secure_discard: 0, max_segments: 128, max_hw_sectors: 248, rotational: 1, wc: 0, fua: 0)

I would appreciate if someone shed light on why it doesn't work after 5.15,
And I am happy to test potential patch for it.

After investigation, seems the culprit is commit 647bf13ce944 ("RDMA/rxe:
Create duplicate mapping tables for FMRs"). The problem is mr_check_range
returns -EFAULT after find iova and length are not valid, so connection between
two VMs can't be established.

Revert the commit manually or apply below temporary change,  rxe works again
with rnbd/rtrs though I don't think it is the right thing to do. Could experts provide
a proper solution? Thanks.

diff --git a/drivers/infiniband/sw/rxe/rxe_mr.c b/drivers/infiniband/sw/rxe/rxe_mr.c
index 453ef3c9d535..4a2fc4d5809d 100644
--- a/drivers/infiniband/sw/rxe/rxe_mr.c
+++ b/drivers/infiniband/sw/rxe/rxe_mr.c
@@ -652,7 +652,7 @@ int rxe_reg_fast_mr(struct rxe_qp *qp, struct rxe_send_wqe *wqe)
        mr->state = RXE_MR_STATE_VALID;

        set = mr->cur_map_set;
-       mr->cur_map_set = mr->next_map_set;
+       //mr->cur_map_set = mr->next_map_set;
        mr->cur_map_set->iova = wqe->wr.wr.reg.mr->iova;
        mr->next_map_set = set;

@@ -662,7 +662,7 @@ int rxe_reg_fast_mr(struct rxe_qp *qp, struct rxe_send_wqe *wqe)
 int rxe_mr_set_page(struct ib_mr *ibmr, u64 addr)
 {
        struct rxe_mr *mr = to_rmr(ibmr);
-       struct rxe_map_set *set = mr->next_map_set;
+       struct rxe_map_set *set = mr->cur_map_set;
        struct rxe_map *map;
        struct rxe_phys_buf *buf;

diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.c b/drivers/infiniband/sw/rxe/rxe_verbs.c
index 80df9a8f71a1..e41d2c8612d8 100644
--- a/drivers/infiniband/sw/rxe/rxe_verbs.c
+++ b/drivers/infiniband/sw/rxe/rxe_verbs.c
@@ -992,7 +992,7 @@ static int rxe_map_mr_sg(struct ib_mr *ibmr, struct scatterlist *sg,
                         int sg_nents, unsigned int *sg_offset)
 {
        struct rxe_mr *mr = to_rmr(ibmr);
-       struct rxe_map_set *set = mr->next_map_set;
+       struct rxe_map_set *set = mr->cur_map_set;

And the test is pretty simple.

1.  VM (server)

modprobe rdma_rxe
rdma link add rxe0 type rxe netdev ens3
modprobe rnbd-server

2.  VM (client)

modprobe rdma_rxe
rdma link add rxe0 type rxe netdev ens3
modprobe rnbd-client
echo "sessname=bla path=ip:$serverip device_path=$block_device_in_server" > /sys/devices/virtual/rnbd-client/ctl/map_device

BTW, I tried wip/jgg-for-next branch with commit 3810c1a1cbe8f.

Guoqing



[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux