Good catch, I hit it as well by following tests: $ while true; do ./bin/run_tests.py --dev rxe_enp3s0 --gid 1 2>&1 ; done run for a while, it throws ERROR: test_atomic_cmp_and_swap (tests.test_atomic.AtomicTest) ---------------------------------------------------------------------- Traceback (most recent call last): File "/home/lizhijian/rdma-core/tests/test_atomic.py", line 110, in test_atomic_cmp_and_swap u.atomic_traffic(**self.traffic_args, send_op=e.IBV_WR_ATOMIC_CMP_AND_SWP) File "/home/lizhijian/rdma-core/tests/utils.py", line 1077, in atomic_traffic poll_cq(client.cq) File "/home/lizhijian/rdma-core/tests/utils.py", line 604, in poll_cq raise PyverbsRDMAError(f'Completion status is {wc_status_to_str(wcs[0].status)}') pyverbs.pyverbs_error.PyverbsRDMAError: Completion status is Remote access error ====================================================================== ERROR: test_atomic_fetch_and_add (tests.test_atomic.AtomicTest) ---------------------------------------------------------------------- Traceback (most recent call last): File "/home/lizhijian/rdma-core/tests/test_atomic.py", line 116, in test_atomic_fetch_and_add u.atomic_traffic(**self.traffic_args, File "/home/lizhijian/rdma-core/tests/utils.py", line 1077, in atomic_traffic poll_cq(client.cq) File "/home/lizhijian/rdma-core/tests/utils.py", line 604, in poll_cq raise PyverbsRDMAError(f'Completion status is {wc_status_to_str(wcs[0].status)}') pyverbs.pyverbs_error.PyverbsRDMAError: Completion status is Remote access error ====================================================================== ERROR: test_mr_rereg_access_bad_flow (tests.test_mr.MRTest) Test that cover rereg MR's access with this flow: ---------------------------------------------------------------------- Traceback (most recent call last): File "/home/lizhijian/rdma-core/tests/test_mr.py", line 129, in test_mr_rereg_access_bad_flow u.rdma_traffic(**self.traffic_args, send_op=e.IBV_WR_RDMA_WRITE) File "/home/lizhijian/rdma-core/tests/utils.py", line 1031, in rdma_traffic poll_cq(client.cq) File "/home/lizhijian/rdma-core/tests/utils.py", line 604, in poll_cq raise PyverbsRDMAError(f'Completion status is {wc_status_to_str(wcs[0].status)}') pyverbs.pyverbs_error.PyverbsRDMAError: Completion status is Remote access error ====================================================================== ERROR: test_qp_ex_rc_atomic_cmp_swp (tests.test_qpex.QpExTestCase) ---------------------------------------------------------------------- Traceback (most recent call last): File "/home/lizhijian/rdma-core/tests/test_qpex.py", line 341, in test_qp_ex_rc_atomic_cmp_swp u.atomic_traffic(client, server, self.iters, self.gid_index, File "/home/lizhijian/rdma-core/tests/utils.py", line 1077, in atomic_traffic poll_cq(client.cq) File "/home/lizhijian/rdma-core/tests/utils.py", line 604, in poll_cq raise PyverbsRDMAError(f'Completion status is {wc_status_to_str(wcs[0].status)}') pyverbs.pyverbs_error.PyverbsRDMAError: Completion status is Remote access error ====================================================================== ERROR: test_qp_ex_rc_atomic_fetch_add (tests.test_qpex.QpExTestCase) ---------------------------------------------------------------------- After digging into the source, i believe that it's same with this one. BTW, i believed that i did such test before, but i didn't get this error until v6.1+ Thanks Zhijian On 15/12/2022 18:14, Daisuke Matsuda wrote: > If you create MRs more than 0x10000 times after loading the module, > responder starts to reply NAKs for RDMA/Atomic operations because of rkey > violation detected in check_rkey(). The root cause is that rkeys are > incremented each time a new MR is created and the value overflows into the > range reserved for MWs. > > Fixes: 0994a1bcd5f7 ("RDMA/rxe: Bump up default maximum values used via uverbs") > Signed-off-by: Daisuke Matsuda <matsuda-daisuke@xxxxxxxxxxx> > --- > drivers/infiniband/sw/rxe/rxe_param.h | 8 ++++---- > 1 file changed, 4 insertions(+), 4 deletions(-) > > diff --git a/drivers/infiniband/sw/rxe/rxe_param.h b/drivers/infiniband/sw/rxe/rxe_param.h > index a754fc902e3d..a3d31bd45895 100644 > --- a/drivers/infiniband/sw/rxe/rxe_param.h > +++ b/drivers/infiniband/sw/rxe/rxe_param.h > @@ -98,10 +98,10 @@ enum rxe_device_param { > RXE_MAX_SRQ = DEFAULT_MAX_VALUE - RXE_MIN_SRQ_INDEX, > > RXE_MIN_MR_INDEX = 0x00000001, > - RXE_MAX_MR_INDEX = DEFAULT_MAX_VALUE, > - RXE_MAX_MR = DEFAULT_MAX_VALUE - RXE_MIN_MR_INDEX, > - RXE_MIN_MW_INDEX = 0x00010001, > - RXE_MAX_MW_INDEX = 0x00020000, > + RXE_MAX_MR_INDEX = DEFAULT_MAX_VALUE >> 1, > + RXE_MAX_MR = 0x00001000, > + RXE_MIN_MW_INDEX = (DEFAULT_MAX_VALUE >> 1) + 1, > + RXE_MAX_MW_INDEX = DEFAULT_MAX_VALUE, > RXE_MAX_MW = 0x00001000, > > RXE_MAX_PKT_PER_ACK = 64,