Re: [PATCH for-rc] IB/cma: Fix false P_Key mismatch messages

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Jul 09, 2021 at 04:45:21PM +0000, Haakon Bugge wrote:
> 
> 
> > On 8 Jul 2021, at 20:52, Jason Gunthorpe <jgg@xxxxxxxxxx> wrote:
> > 
> > On Thu, Jul 08, 2021 at 03:59:25PM +0000, Haakon Bugge wrote:
> >> 
> >> 
> >>> On 5 Jul 2021, at 18:59, Haakon Bugge <haakon.bugge@xxxxxxxxxx> wrote:
> >>> 
> >>> 
> >>> 
> >>>> On 5 Jul 2021, at 18:26, Jason Gunthorpe <jgg@xxxxxxxxxx> wrote:
> >>>> 
> >>>> On Tue, Jun 29, 2021 at 01:45:35PM +0000, Haakon Bugge wrote:
> >>>> 
> >>>>>>>> IMHO it is a bug on the sender side to send GMPs to use a pkey that
> >>>>>>>> doesn't exactly match the data path pkey.
> >>>>>>> 
> >>>>>>> The active connector calls ib_addr_get_pkey(). This function
> >>>>>>> extracts the pkey from byte 8/9 in the device's bcast
> >>>>>>> address. However, RFC 4391 explicitly states:
> >>>>>> 
> >>>>>> pkeys in CM come only from path records that the SM returns, the above
> >>>>>> should only be used to feed into a path record query which could then
> >>>>>> return back a limited pkey.
> >>>>>> 
> >>>>>> Everything thereafter should use the SM's version of the pkey.
> >>>>> 
> >>>>> Revisiting this. I think I mis-interpreted the scenario that led to
> >>>>> the P_Key mismatch messages.
> >>>>> 
> >>>>> The CM retrieves the pkey_index that matched the P_Key in the BTH
> >>>>> (cm_get_bth_pkey()) and thereafter calls ib_get_cached_pkey() to get
> >>>>> the P_Key value of the particular pkey_index.
> >>>>> 
> >>>>> Assume a full-member sends a REQ. In that case, both P_Keys (BTH and
> >>>>> primary path_rec) are full. Further, assume the recipient is only a
> >>>>> limited member. Since full and limited members of the same partition
> >>>>> are eligible to communicate, the P_Key retrieved by
> >>>>> cm_get_bth_pkey() will be the limited one.
> >>>> 
> >>>> It is incorrect for the issuer of the REQ to put a full pkey in the
> >>>> REQ message when the target is a limited member.
> >>> 
> >>> Sorry, I mis-interpreted the spec. I though the PKey in the Path record should be that of the initiator, not the target's. OK. Will come up with a fix.
> >> 
> >> On the systems I have access to (running Oracle flavour OpenSM in
> >> our NM2 switches), the behaviour is exactly the opposite of what you
> >> say.
> > 
> > Check with saquery what is happening, if you request a reversible path
> > from the CM target (limited pkey) to the CM client (full) you should
> > get the limited pkey or the SM is broken.
> > 
> > If the SM is working then probably something in the stack is using a
> > reversed src/dest when doing the PR query.
> > 
> > It is not intuitive but the PR query should have SGID as the CM Target
> > even though it is running on the CM Client.
> 
> That is not how it is today. And because of that, all accesses to
> the PR assume the d{gid,lid} is the remote peer. To fix this, I have
> to swap dgid/sgid and ib.dlid/ib.slid all over to get this
> working. That is pervasive. E.g., even includes ipoib. Let me know
> if that is what you want.

It is only things that use the paths to generate CM REQ messages, and
yes it is the right thing to do.

Jason



[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux