Re: [PATCH for-rc] IB/cma: Fix false P_Key mismatch messages

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




> On 8 Jul 2021, at 20:52, Jason Gunthorpe <jgg@xxxxxxxxxx> wrote:
> 
> On Thu, Jul 08, 2021 at 03:59:25PM +0000, Haakon Bugge wrote:
>> 
>> 
>>> On 5 Jul 2021, at 18:59, Haakon Bugge <haakon.bugge@xxxxxxxxxx> wrote:
>>> 
>>> 
>>> 
>>>> On 5 Jul 2021, at 18:26, Jason Gunthorpe <jgg@xxxxxxxxxx> wrote:
>>>> 
>>>> On Tue, Jun 29, 2021 at 01:45:35PM +0000, Haakon Bugge wrote:
>>>> 
>>>>>>>> IMHO it is a bug on the sender side to send GMPs to use a pkey that
>>>>>>>> doesn't exactly match the data path pkey.
>>>>>>> 
>>>>>>> The active connector calls ib_addr_get_pkey(). This function
>>>>>>> extracts the pkey from byte 8/9 in the device's bcast
>>>>>>> address. However, RFC 4391 explicitly states:
>>>>>> 
>>>>>> pkeys in CM come only from path records that the SM returns, the above
>>>>>> should only be used to feed into a path record query which could then
>>>>>> return back a limited pkey.
>>>>>> 
>>>>>> Everything thereafter should use the SM's version of the pkey.
>>>>> 
>>>>> Revisiting this. I think I mis-interpreted the scenario that led to
>>>>> the P_Key mismatch messages.
>>>>> 
>>>>> The CM retrieves the pkey_index that matched the P_Key in the BTH
>>>>> (cm_get_bth_pkey()) and thereafter calls ib_get_cached_pkey() to get
>>>>> the P_Key value of the particular pkey_index.
>>>>> 
>>>>> Assume a full-member sends a REQ. In that case, both P_Keys (BTH and
>>>>> primary path_rec) are full. Further, assume the recipient is only a
>>>>> limited member. Since full and limited members of the same partition
>>>>> are eligible to communicate, the P_Key retrieved by
>>>>> cm_get_bth_pkey() will be the limited one.
>>>> 
>>>> It is incorrect for the issuer of the REQ to put a full pkey in the
>>>> REQ message when the target is a limited member.
>>> 
>>> Sorry, I mis-interpreted the spec. I though the PKey in the Path record should be that of the initiator, not the target's. OK. Will come up with a fix.
>> 
>> On the systems I have access to (running Oracle flavour OpenSM in
>> our NM2 switches), the behaviour is exactly the opposite of what you
>> say.
> 
> Check with saquery what is happening, if you request a reversible path
> from the CM target (limited pkey) to the CM client (full) you should
> get the limited pkey or the SM is broken.
> 
> If the SM is working then probably something in the stack is using a
> reversed src/dest when doing the PR query.
> 
> It is not intuitive but the PR query should have SGID as the CM Target
> even though it is running on the CM Client.

That is not how it is today. And because of that, all accesses to the PR assume the d{gid,lid} is the remote peer. To fix this, I have to swap dgid/sgid and ib.dlid/ib.slid all over to get this working. That is pervasive. E.g., even includes ipoib. Let me know if that is what you want.


Thxs, Håkon

> 
> This is because the REQ is supposed to contain a path that is relative
> to the target.
> 
> Everything will be the same except for this small detail about
> full/limited pkeys.
> 
> The client can figure out what to do with its own pkey table locally.
> 
>> "the P_Key table entry (0x1234) matching incoming BTH.P_Key differs from primary path P_Key (0x9234)"
> 
> "The REQ contains a PKey (0x1234) that is not found in this device's
> PKey table. Using alternative limited Pkey (0x9234) instead. This is a
> client bug"
> 
> Jason





[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux