On Sat, Sep 19, 2020 at 12:03:40PM +0300, Leon Romanovsky wrote: > On Fri, Sep 18, 2020 at 08:26:19PM -0300, Jason Gunthorpe wrote: > > On Mon, Sep 07, 2020 at 03:21:49PM +0300, Leon Romanovsky wrote: > > > From: Leon Romanovsky <leonro@xxxxxxxxxxxx> > > > > > > The RDMA-CM code wasn't consistent in flows that attached to cma_dev, > > > this caused to situations where failure during attach to listen on such > > > device leave RDMA-CM in non-consistent state. > > > > > > Update the listen/attach flow to correctly deal with failures. > > > > > > Signed-off-by: Leon Romanovsky <leonro@xxxxxxxxxxxx> > > > drivers/infiniband/core/cma.c | 197 ++++++++++++++++++++-------------- > > > 1 file changed, 114 insertions(+), 83 deletions(-) > > > > > > diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c > > > index 3fc3c821743d..ab1f8b707a5b 100644 > > > +++ b/drivers/infiniband/core/cma.c > > > @@ -458,8 +458,8 @@ static int cma_igmp_send(struct net_device *ndev, union ib_gid *mgid, bool join) > > > return (in_dev) ? 0 : -ENODEV; > > > } > > > > > > -static void _cma_attach_to_dev(struct rdma_id_private *id_priv, > > > - struct cma_device *cma_dev) > > > +static int _cma_attach_to_dev(struct rdma_id_private *id_priv, > > > + struct cma_device *cma_dev) > > > { > > > cma_dev_get(cma_dev); > > > id_priv->cma_dev = cma_dev; > > > @@ -475,15 +475,22 @@ static void _cma_attach_to_dev(struct rdma_id_private *id_priv, > > > rdma_restrack_add(&id_priv->res); > > > > > > trace_cm_id_attach(id_priv, cma_dev->device); > > > + return 0; > > > } > > > > This commit message doesn't explain this patch at all. This is adding > > a return code to _cma_attach_to_dev because some later patch needs > > it. This should also be ordered directly before the later patch > > > > > -static void cma_listen_on_dev(struct rdma_id_private *id_priv, > > > - struct cma_device *cma_dev) > > > +static int cma_listen_on_dev(struct rdma_id_private *id_priv, > > > + struct cma_device *cma_dev) > > > { > > > struct rdma_id_private *dev_id_priv; > > > struct rdma_cm_id *id; > > > @@ -2491,12 +2500,12 @@ static void cma_listen_on_dev(struct rdma_id_private *id_priv, > > > lockdep_assert_held(&lock); > > > > > > if (cma_family(id_priv) == AF_IB && !rdma_cap_ib_cm(cma_dev->device, 1)) > > > - return; > > > + return 0; > > > > > > id = __rdma_create_id(net, cma_listen_handler, id_priv, id_priv->id.ps, > > > id_priv->id.qp_type, id_priv->res.kern_name); > > > if (IS_ERR(id)) > > > - return; > > > + return PTR_ERR(id); > > > > And there here it is fixing already missing error handling, seems like > > it could be two patches > > I don't think so, it is "visible" only after I changed cma_listen_on_dev() function return type, > which is an outcome of _cma_attach_to_dev() change. If the first patch adds this missing error handling to cam_listen_on_dev, then the 2nd patch adds new error handling to _cma_attach_to_dev() it would be much clearer what this is all about. Jason > Thanks > > > > > Jason