Re: [PATCH for-rc] IB/cma: Fix false P_Key mismatch messages

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




> On 5 Jul 2021, at 18:59, Haakon Bugge <haakon.bugge@xxxxxxxxxx> wrote:
> 
> 
> 
>> On 5 Jul 2021, at 18:26, Jason Gunthorpe <jgg@xxxxxxxxxx> wrote:
>> 
>> On Tue, Jun 29, 2021 at 01:45:35PM +0000, Haakon Bugge wrote:
>> 
>>>>>> IMHO it is a bug on the sender side to send GMPs to use a pkey that
>>>>>> doesn't exactly match the data path pkey.
>>>>> 
>>>>> The active connector calls ib_addr_get_pkey(). This function
>>>>> extracts the pkey from byte 8/9 in the device's bcast
>>>>> address. However, RFC 4391 explicitly states:
>>>> 
>>>> pkeys in CM come only from path records that the SM returns, the above
>>>> should only be used to feed into a path record query which could then
>>>> return back a limited pkey.
>>>> 
>>>> Everything thereafter should use the SM's version of the pkey.
>>> 
>>> Revisiting this. I think I mis-interpreted the scenario that led to
>>> the P_Key mismatch messages.
>>> 
>>> The CM retrieves the pkey_index that matched the P_Key in the BTH
>>> (cm_get_bth_pkey()) and thereafter calls ib_get_cached_pkey() to get
>>> the P_Key value of the particular pkey_index.
>>> 
>>> Assume a full-member sends a REQ. In that case, both P_Keys (BTH and
>>> primary path_rec) are full. Further, assume the recipient is only a
>>> limited member. Since full and limited members of the same partition
>>> are eligible to communicate, the P_Key retrieved by
>>> cm_get_bth_pkey() will be the limited one.
>> 
>> It is incorrect for the issuer of the REQ to put a full pkey in the
>> REQ message when the target is a limited member.
> 
> Sorry, I mis-interpreted the spec. I though the PKey in the Path record should be that of the initiator, not the target's. OK. Will come up with a fix.

On the systems I have access to (running Oracle flavour OpenSM in our NM2 switches), the behaviour is exactly the opposite of what you say. 

So, if we (Oracle) are the only ones seeing this warning (I repeat it here to catch some interest):

RDMA CMA: got different BTH P_Key (0x2a00) and primary path P_Key (0xaa00)
RDMA CMA: in the future this may cause the request to be dropped

then there is no fix in the RDMA stack. It must be fixed in Oracle's OpenSM.

The only thing I can do here is to straighten up the warning message, which is imprecise. What about:

"the P_Key table entry (0x1234) matching incoming BTH.P_Key differs from primary path P_Key (0x9234)"

My literal interpretation of the old warning message confused me!


Thxs, Håkon


> 
> 
> Thxs, Håkon
> 
>> 
>> The CM model in IB has the target fully under the control of the
>> initiator, and it is up to the initiator to ask the SM how the target
>> should generate its return traffic. The SM is reponsible to say that
>> limited->full communication is done using the limited pkey.
>> 
>> The initiator is reponsible to place that limited pkey in the REQ
>> message.
>> 
>> Somewhere in your system this isn't happening properly, and it is a
>> bug that the CM is correctly identifying.
>> 
>> Jason





[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux