Re: [PATCH for-rdma-core v2] pyverbs: Fix wrong rkey in test_qp_ex_rc_bind_mw

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 7/4/22 08:24, Jason Gunthorpe wrote:
> On Thu, May 19, 2022 at 10:58:11AM -0500, Bob Pearson wrote:
>> The current test_qp_ex_rc_bind_mw in tests/test_qpex.py uses an incorrect
>> value for the new_rkey based on the old mr.rkey. This patch fixes that
>> behavior by basing the new rkey on the old mw.rkey instead.
>>
>> Before this patch the test will fail for the rxe driver about 1 in 256
>> tries since randomly that is the freguency of new_rkeys which have the
>> same 8 bit key portion as the current mw which is not allowed. With
>> this patch those errors do not occur.
> 
> While this patch seems correct, I don't understand these remarks.
> 
> IBA says:
> 
>  After the bind operation completes, the R_Key must consist of the 24
>  bit index associated with the Type 2 Memory Window and the 8 bit key
>  supplied by the Con- sumer in the Post Send Bind Memory Window Work
>  Request.
> 
> Meaning the bind should only be processing the lower variant bits in
> the first place, and there should be no condition where the bind could
> fail since all varient bits are always legal.
> 
> Bind does not allow changing the upper fixed bit - only allocation can
> change those. So if rxe kernel side is changing the upper bits it is
> also buggy. IMHO it would be appropriate to fail the operation if
> given rkey has unmatched upper bits.
> 
> Jason

The root cause of this failure was that the script generated the new mw rkey
from the mr rkey which had nothing to do with it. The current mw implementation
enforces a rule that the new mw rkey have a different key portion than the old
mw rkey. Since the mr rkey is changing independently from the mw rkeys this rule
will randomly cause an error approx 1 out of 256 times. You are questioning the
validity of this rule.

According to the IBA there are two types of mws type 1 and 2. Type 1 mws are bound
through a user space api ibv_bind_mw() while type 2 mws are bound with a sq operation.
In the kernel there is no implementation of the type 1 bind API so it is implemented
by mapping it into a sq operation.

The sq operation has 3 key parameters, the mr lkey, the original mw rkey and the new
key portion of the mw rkey. Only the low order 8 bits of this parameter are used.

For type 1 mws the operation returns a new rkey key portion which is guaranteed to be
different than the original one. According to the man page the user space app must
save the old rkey and if the operation completes with a failure it must fix up the
key to the old value. Checking the key is different in this case is just paranoia since
the rdma-core library is supposed to change it.

For type 2 mws, as you say, the key portion is entirely up to the consumer so any value
is valid. The current test that the key portion is different can be dropped. There is
a slight problem though which is fixed by the test even though it is not proper. C10-68
requires that as soon as a bind operation is complete that no operations referring to
the previous binding shall succeed without error. The problem is how do you know
if the rkey is the same for the new and old bindings are the same. The sane way to do
this is to make sure they are different. This is how all the examples of use cases I
have seen proceed.

Bob



[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux