Re: [PATCH for-rc] RDMA/cm: Do not send REJ when remote_id is unknown

Jason Gunthorpe <jgg@xxxxxxxx> · Thu, 16 Apr 2020 15:23:25 -0300

On Thu, Apr 16, 2020 at 03:24:15PM +0200, Håkon Bugge wrote:
> 
> 
> > On 14 Apr 2020, at 22:40, Jason Gunthorpe <jgg@xxxxxxxx> wrote:
> > 
> > On Tue, Apr 14, 2020 at 01:17:20PM +0200, Håkon Bugge wrote:
> >> In cm_destroy_id(), when the cm_id state is IB_CM_REQ_SENT or
> >> IB_CM_MRA_REQ_RCVD, an attempt is made to send a REJ with
> >> IB_CM_REJ_TIMEOUT as the reject reason.
> > 
> > Yes, this causes the remote to destroy the half completed connection,
> > for instance if the path is unidirectional.
> > 
> >> However, in said states, we have no remote_id. For the REQ_SENT case,
> >> we simply haven't received anything from our peer,
> > 
> > Which is correct, the spec covers this in Table 108 which describes
> > the remote communication ID as '0 if REJecting due to REP timeout and
> > no MRA was received'
> 
> Yes, the spec has the phrase for a REJ due toa timeout: "The
> recipient of a REJ message with this reason code must use this CA
> GUID to identify the sender, as it is possible that the Remote
> Communication ID in the REJ message may not be valid." [1]

It is a bit confusing, should we use the remote_id if not 0 after MRA?
Seems like a reasonable thing to do...

> > +static struct cm_id_private *cm_acquire_id(__be32 local_id, __be32 remote_id)
> > +{
> > +	struct cm_id_private *cm_id_priv = cm_acquire_req(local_id);
> > +
> > +	if (READ_ONCE(cm_id_priv->id.remote_id) != remote_id) {
> 
> Hmm, what if cm_id_priv is NULL?

Right, it should return NULL, woops

> The rest looks good to me, but I would like to backport it to the
> version I am familiar with and test it.

You might need a lot of backporting at this point :)

Jason