On Tue, May 01, 2018 at 02:05:13PM -0400, Doug Ledford wrote: > On Mon, 2018-04-30 at 15:56 -0600, Jason Gunthorpe wrote: > > On Sun, Apr 29, 2018 at 10:46:46AM +0300, Leon Romanovsky wrote: > > > From: Parav Pandit <parav@xxxxxxxxxxxx> > > > > > > Commit given in fixes tag introduced an accurate check to validate > > > device, port, index by referring to the cache layer for querying GIDs > > > for all link layers (IB, RoCE and iWarp). > > > > > > When rdmacm tries to modify the QP to RTR state for kernel consumers, > > > qp and ah attributes are uninitialized. Each transport layer (IB/iWarp) > > > initializes them depending on transport type. > > > However qp ah_attr are not used for iWarp and remained uninitialized, > > > which is further used in ib_query_gid() call. This results into a > > > failure to query the GID due to an invalid GID index coming from > > > the uninitialized stack memory. > > > This is reported and discussed in thread [1]. > > > > What is ib_query_gid supposed to do for iWarp? > > Return the sole entry in the GID table. But, you pass the index, so you > have to know to pass index 0. In this case, because of uninitialized > memory in the ah_attr, it's being told to try and return another entry > that is invalid. > > However, I see no benefit to doing this bandaid fix while leaving the > real bug: that we are calling ib_query_gid() for no purpose in this > function now a days. Just clear out the bad usage and move on. To that > end, I've written a patch for this: > > [dledford@haswell-e linus (k.o/wip/dl-for-rc *)]$ git diff > diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c > index 8364223422d0..a693fcd4c513 100644 > --- a/drivers/infiniband/core/cma.c > +++ b/drivers/infiniband/core/cma.c > @@ -868,7 +868,6 @@ static int cma_modify_qp_rtr(struct rdma_id_private *id_priv, > { > struct ib_qp_attr qp_attr; > int qp_attr_mask, ret; > - union ib_gid sgid; > > mutex_lock(&id_priv->qp_mutex); > if (!id_priv->id.qp) { > @@ -891,12 +890,6 @@ static int cma_modify_qp_rtr(struct rdma_id_private *id_priv, > if (ret) > goto out; > > - ret = ib_query_gid(id_priv->id.device, id_priv->id.port_num, > - rdma_ah_read_grh(&qp_attr.ah_attr)->sgid_index, > - &sgid, NULL); > - if (ret) > - goto out; > - > BUG_ON(id_priv->cma_dev->device != id_priv->id.device); > > if (conn_param) > --- > Steve/Shiraz, can either of you test this real quick to make sure it > still works in your specific situation (I can't imagine it not working, > but a test would be nice)? Yes. It works. > I think we should also be at a point where we can remove that > BUG_ON()... > > -- > Doug Ledford <dledford@xxxxxxxxxx> > GPG KeyID: B826A3330E572FDD > Key fingerprint = AE6B 1BDA 122B 23B4 265B 1274 B826 A333 0E57 2FDD -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html