Re: [PATCH] RDMA/rxe: For fast memory reg wr set both lkey and rkey

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, May 20, 2022 at 5:20 PM Haris Iqbal <haris.iqbal@xxxxxxxxx> wrote:
>
> On Fri, May 20, 2022 at 5:19 PM Md Haris Iqbal <haris.iqbal@xxxxxxxxx> wrote:
> >
> > Fixes: 001345339f4c ("RDMA/rxe: Separate HW and SW l/rkeys")
> > Signed-off-by: Md Haris Iqbal <haris.iqbal@xxxxxxxxx>
> > Signed-off-by: Aleksei Marov <aleksei.marov@xxxxxxxxx>
> > ---
> > Cc: rpearsonhpe@xxxxxxxxx
> > ---
> >  drivers/infiniband/sw/rxe/rxe_mr.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/drivers/infiniband/sw/rxe/rxe_mr.c b/drivers/infiniband/sw/rxe/rxe_mr.c
> > index fc3942e04a1f..04eb804efb23 100644
> > --- a/drivers/infiniband/sw/rxe/rxe_mr.c
> > +++ b/drivers/infiniband/sw/rxe/rxe_mr.c
> > @@ -648,7 +648,7 @@ int rxe_reg_fast_mr(struct rxe_qp *qp, struct rxe_send_wqe *wqe)
> >
> >         mr->access = access;
> >         mr->lkey = (mr->lkey & ~0xff) | key;
> > -       mr->rkey = (access & IB_ACCESS_REMOTE) ? mr->lkey : 0;
> > +       mr->rkey = mr->lkey;
> >         mr->state = RXE_MR_STATE_VALID;
> >
> >         set = mr->cur_map_set;
> > --
> > 2.25.1
>
> This code was changed in the below mentioned commit.
>
> commit 001345339f4ca85790a1644a74e33ae77ac116be
> Author: Bob Pearson <rpearsonhpe@xxxxxxxxx>
> Date:   Tue Sep 14 11:42:05 2021 -0500
>
>     RDMA/rxe: Separate HW and SW l/rkeys
>
>     Separate software and simulated hardware lkeys and rkeys for MRs and MWs.
>     This makes struct ib_mr and struct ib_mw isolated from hardware changes
>     triggered by executing work requests.
>
>     This change fixes a bug seen in blktest.
>
>     Link: https://lore.kernel.org/r/20210914164206.19768-4-rpearsonhpe@xxxxxxxxx
>     Signed-off-by: Bob Pearson <rpearsonhpe@xxxxxxxxx>
>     Signed-off-by: Jason Gunthorpe <jgg@xxxxxxxxxx>
>
>
> After the change, mapping for rnbd/rtrs fails with the following error.
>
> Mapping error,
> Client
>
> [Fri May 20 20:27:45 2022] rdma_rxe: qp#17 moved to error state
> [Fri May 20 20:27:45 2022] rtrs_server L1211: <blya>: remote access
> error (wr_cqe: 000000002d27244a, type: 0, vendor_err: 0x0, len: 0)
>
> Server
> [Fri May 20 20:27:45 2022] rnbd_client L597: Mapping device
> /dev/nullb0 on session blya, (access_mode: rw, nr_poll_queues: 0)
> [Fri May 20 20:27:45 2022] rnbd_client L1208: [session=blya] mapped
> 4/4 default/read queues.
> [Fri May 20 20:27:45 2022] rtrs_client L605: <blya>: RDMA failed:
> remote access error
> [Fri May 20 20:27:45 2022] rtrs_client L346: <blya>: Failed
> IB_WR_LOCAL_INV: WR flushed
> [Fri May 20 20:27:45 2022] rtrs_client L605: <blya>: RDMA failed:
> remote access error
> [Fri May 20 20:27:45 2022] rtrs_client L346: <blya>: Failed
> IB_WR_LOCAL_INV: WR flushed
> [Fri May 20 20:27:45 2022] rnbd_client L413: </dev/nullb0@blya> read
> I/O failed with err: -103
> [Fri May 20 20:27:45 2022] I/O error, dev rnbd0, sector 0 op
> 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
> [Fri May 20 20:27:45 2022] rnbd_client L653: </dev/nullb0@blya> Device
> disconnected.
> [Fri May 20 20:27:45 2022] Buffer I/O error on dev rnbd0, logical
> block 0, async page read
> [Fri May 20 20:27:45 2022] rnbd_client L1049: </dev/nullb0@blya> RTRS
> failed to transfer IO, err: -103
> [Fri May 20 20:27:45 2022] I/O error, dev rnbd0, sector 0 op
> 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
> [Fri May 20 20:27:45 2022] Buffer I/O error on dev rnbd0, logical
> block 0, async page read
> [Fri May 20 20:27:45 2022] ldm_validate_partition_table(): Disk read failed.
> [Fri May 20 20:27:45 2022] rnbd_client L1049: </dev/nullb0@blya> RTRS
> failed to transfer IO, err: -103
> [Fri May 20 20:27:45 2022] I/O error, dev rnbd0, sector 0 op
> 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
> [Fri May 20 20:27:45 2022] Buffer I/O error on dev rnbd0, logical
> block 0, async page read
> [Fri May 20 20:27:45 2022] rnbd_client L1049: </dev/nullb0@blya> RTRS
> failed to transfer IO, err: -103
> [Fri May 20 20:27:45 2022] I/O error, dev rnbd0, sector 0 op
> 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
> [Fri May 20 20:27:45 2022] Buffer I/O error on dev rnbd0, logical
> block 0, async page read
> [Fri May 20 20:27:45 2022]  rnbd0: unable to read partition table
>
> Error on write
> Client
> [Fri May 20 20:28:39 2022] rdma_rxe: rxe_invalidate_mr: rkey (0x20416)
> doesn't match mr->rkey (0x0)
> [Fri May 20 20:28:39 2022] rtrs_client L346: <blya>: Failed
> IB_WR_LOCAL_INV: local QP operation error
>
> The code for rnbd/rtrs works fine on mlx4 and mlx5.
> From the spec perspective, is there a need for the wr access flag to
> be set while doing IB_WR_REG_MR?

We went through the IB spec and realized that what RXE is doing is correct.

in IBA spec it is said in sections
"10.6.3.7 FAST REGISTRATION"
..."If the free L_Key has an accompanying free R_Key, and the
Consumer requests Remote Access Rights, a Fast Registration associ-
ates the R_Key with a set of memory locations"

Hence my above patch is incorrect and should not be considered. The
correct fix would be to change the access flag in RTRS IB_WR_REG_MR
wr. We will send a patch soon.

But I do have a follow-up question. Since without this access flag,
the code works fine in mlx4 and mlx5, they are doing something
differently. Is this an issue?

>
>
>
>
> >



[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux