On Thu, Apr 16, 2020 at 03:24:15PM +0200, Håkon Bugge wrote: > > > > On 14 Apr 2020, at 22:40, Jason Gunthorpe <jgg@xxxxxxxx> wrote: > > > > On Tue, Apr 14, 2020 at 01:17:20PM +0200, Håkon Bugge wrote: > >> In cm_destroy_id(), when the cm_id state is IB_CM_REQ_SENT or > >> IB_CM_MRA_REQ_RCVD, an attempt is made to send a REJ with > >> IB_CM_REJ_TIMEOUT as the reject reason. > > > > Yes, this causes the remote to destroy the half completed connection, > > for instance if the path is unidirectional. > > > >> However, in said states, we have no remote_id. For the REQ_SENT case, > >> we simply haven't received anything from our peer, > > > > Which is correct, the spec covers this in Table 108 which describes > > the remote communication ID as '0 if REJecting due to REP timeout and > > no MRA was received' > > Yes, the spec has the phrase for a REJ due toa timeout: "The > recipient of a REJ message with this reason code must use this CA > GUID to identify the sender, as it is possible that the Remote > Communication ID in the REJ message may not be valid." [1] It is a bit confusing, should we use the remote_id if not 0 after MRA? Seems like a reasonable thing to do... > > +static struct cm_id_private *cm_acquire_id(__be32 local_id, __be32 remote_id) > > +{ > > + struct cm_id_private *cm_id_priv = cm_acquire_req(local_id); > > + > > + if (READ_ONCE(cm_id_priv->id.remote_id) != remote_id) { > > Hmm, what if cm_id_priv is NULL? Right, it should return NULL, woops > The rest looks good to me, but I would like to backport it to the > version I am familiar with and test it. You might need a lot of backporting at this point :) Jason