Re: [PATCH 2/2] RDMA/rxe: Prevent faulty rkey generation

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Good catch, I hit it as well by following tests:

$ while true; do ./bin/run_tests.py --dev rxe_enp3s0 --gid 1 2>&1 ; done

run for a while, it throws

ERROR: test_atomic_cmp_and_swap (tests.test_atomic.AtomicTest)
----------------------------------------------------------------------
Traceback (most recent call last):
   File "/home/lizhijian/rdma-core/tests/test_atomic.py", line 110, in test_atomic_cmp_and_swap
     u.atomic_traffic(**self.traffic_args, send_op=e.IBV_WR_ATOMIC_CMP_AND_SWP)
   File "/home/lizhijian/rdma-core/tests/utils.py", line 1077, in atomic_traffic
     poll_cq(client.cq)
   File "/home/lizhijian/rdma-core/tests/utils.py", line 604, in poll_cq
     raise PyverbsRDMAError(f'Completion status is {wc_status_to_str(wcs[0].status)}')
pyverbs.pyverbs_error.PyverbsRDMAError: Completion status is Remote access error
                                                                                                                    
======================================================================
ERROR: test_atomic_fetch_and_add (tests.test_atomic.AtomicTest)
----------------------------------------------------------------------
Traceback (most recent call last):
   File "/home/lizhijian/rdma-core/tests/test_atomic.py", line 116, in test_atomic_fetch_and_add
     u.atomic_traffic(**self.traffic_args,
   File "/home/lizhijian/rdma-core/tests/utils.py", line 1077, in atomic_traffic
     poll_cq(client.cq)
   File "/home/lizhijian/rdma-core/tests/utils.py", line 604, in poll_cq
     raise PyverbsRDMAError(f'Completion status is {wc_status_to_str(wcs[0].status)}')
pyverbs.pyverbs_error.PyverbsRDMAError: Completion status is Remote access error
                                                                                                                    
======================================================================
ERROR: test_mr_rereg_access_bad_flow (tests.test_mr.MRTest)
Test that cover rereg MR's access with this flow:
----------------------------------------------------------------------
Traceback (most recent call last):
   File "/home/lizhijian/rdma-core/tests/test_mr.py", line 129, in test_mr_rereg_access_bad_flow
     u.rdma_traffic(**self.traffic_args, send_op=e.IBV_WR_RDMA_WRITE)
   File "/home/lizhijian/rdma-core/tests/utils.py", line 1031, in rdma_traffic
     poll_cq(client.cq)
   File "/home/lizhijian/rdma-core/tests/utils.py", line 604, in poll_cq
     raise PyverbsRDMAError(f'Completion status is {wc_status_to_str(wcs[0].status)}')
pyverbs.pyverbs_error.PyverbsRDMAError: Completion status is Remote access error
                                                                                                                    
======================================================================
ERROR: test_qp_ex_rc_atomic_cmp_swp (tests.test_qpex.QpExTestCase)
----------------------------------------------------------------------
Traceback (most recent call last):
   File "/home/lizhijian/rdma-core/tests/test_qpex.py", line 341, in test_qp_ex_rc_atomic_cmp_swp
     u.atomic_traffic(client, server, self.iters, self.gid_index,
   File "/home/lizhijian/rdma-core/tests/utils.py", line 1077, in atomic_traffic
     poll_cq(client.cq)
   File "/home/lizhijian/rdma-core/tests/utils.py", line 604, in poll_cq
     raise PyverbsRDMAError(f'Completion status is {wc_status_to_str(wcs[0].status)}')
pyverbs.pyverbs_error.PyverbsRDMAError: Completion status is Remote access error
                                                                                                                    
======================================================================
ERROR: test_qp_ex_rc_atomic_fetch_add (tests.test_qpex.QpExTestCase)
----------------------------------------------------------------------

After digging into the source, i believe that it's same with this one.

BTW, i believed that i did such test before, but i didn't get this error until v6.1+


Thanks
Zhijian



On 15/12/2022 18:14, Daisuke Matsuda wrote:
> If you create MRs more than 0x10000 times after loading the module,
> responder starts to reply NAKs for RDMA/Atomic operations because of rkey
> violation detected in check_rkey(). The root cause is that rkeys are
> incremented each time a new MR is created and the value overflows into the
> range reserved for MWs.
> 
> Fixes: 0994a1bcd5f7 ("RDMA/rxe: Bump up default maximum values used via uverbs")
> Signed-off-by: Daisuke Matsuda <matsuda-daisuke@xxxxxxxxxxx>
> ---
>   drivers/infiniband/sw/rxe/rxe_param.h | 8 ++++----
>   1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/infiniband/sw/rxe/rxe_param.h b/drivers/infiniband/sw/rxe/rxe_param.h
> index a754fc902e3d..a3d31bd45895 100644
> --- a/drivers/infiniband/sw/rxe/rxe_param.h
> +++ b/drivers/infiniband/sw/rxe/rxe_param.h
> @@ -98,10 +98,10 @@ enum rxe_device_param {
>   	RXE_MAX_SRQ			= DEFAULT_MAX_VALUE - RXE_MIN_SRQ_INDEX,
>   
>   	RXE_MIN_MR_INDEX		= 0x00000001,
> -	RXE_MAX_MR_INDEX		= DEFAULT_MAX_VALUE,
> -	RXE_MAX_MR			= DEFAULT_MAX_VALUE - RXE_MIN_MR_INDEX,
> -	RXE_MIN_MW_INDEX		= 0x00010001,
> -	RXE_MAX_MW_INDEX		= 0x00020000,
> +	RXE_MAX_MR_INDEX		= DEFAULT_MAX_VALUE >> 1,
> +	RXE_MAX_MR			= 0x00001000,
> +	RXE_MIN_MW_INDEX		= (DEFAULT_MAX_VALUE >> 1) + 1,
> +	RXE_MAX_MW_INDEX		= DEFAULT_MAX_VALUE,
>   	RXE_MAX_MW			= 0x00001000,
>   
>   	RXE_MAX_PKT_PER_ACK		= 64,




[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux