RE: [for-next 1/2] xprtrdma: take reference of rdma provider module

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



In rpcrdma_ep_connect():

write_lock(&ia->ri_qplock);
                old = ia->ri_id;
                ia->ri_id = id;
                write_unlock(&ia->ri_qplock);

                rdma_destroy_qp(old);
                rdma_destroy_id(old);  =============> Cm -id is destroyed here.


If following code fails in rpcrdma_ep_connect():
id = rpcrdma_create_id(xprt, ia,
                                (struct sockaddr *)&xprt->rx_data.addr);
                if (IS_ERR(id)) {
                        rc = -EHOSTUNREACH;
                        goto out;
                }

it leaves old cm-id still alive. This will always fail if Device is removed abruptly.

In rdma_resolve_addr()/rdma_destroy_id() cm_dev is referenced/de-referenced here (cma.c):

static int cma_acquire_dev(struct rdma_id_private *id_priv,
                           struct rdma_id_private *listen_id_priv) {
.
.
if (!ret)
                cma_attach_to_dev(id_priv, cma_dev);
}

static void cma_release_dev(struct rdma_id_private *id_priv)
{
        mutex_lock(&lock);
        list_del(&id_priv->list);
        cma_deref_dev(id_priv->cma_dev);
.
.
}

Since as per design of nfs-rdma at-least previously known good cm-id always remains live utill
another good cm-id is created, cma_dev->refcount never becomes 0 upon device removal .
Thus blocking the rmmod <vendor driver> forever.

-Regards
 Devesh

> -----Original Message-----
> From: linux-rdma-owner@xxxxxxxxxxxxxxx [mailto:linux-rdma-
> owner@xxxxxxxxxxxxxxx] On Behalf Of Devesh Sharma
> Sent: Monday, July 21, 2014 11:42 AM
> To: Shirley Ma; Steve Wise; 'Chuck Lever'
> Cc: 'Hefty, Sean'; 'Roland Dreier'; linux-rdma@xxxxxxxxxxxxxxx
> Subject: RE: [for-next 1/2] xprtrdma: take reference of rdma provider
> module
> 
> Shirley,
> 
> Once rmmod is issued, the connection corresponding to the active mount is
> destroyed and all the associated resources Are freed. As per the processing
> logic of DEVICE-REMOVAL event, nfs-rdma wakes-up all the  waiters, This
> results into Re-establishment efforts, since the device is not present any
> more, rdma_resolve_address() fails with CM resolution Error. This loop
> continues forever.
> 
> I am yet to find out which part of ocrdma is blocked. I am putting some debug
> messages to find it out. I will get back to The group with an update.
> 
> -Regards
>  Devesh
> 
> > -----Original Message-----
> > From: Shirley Ma [mailto:shirley.ma@xxxxxxxxxx]
> > Sent: Friday, July 18, 2014 9:18 PM
> > To: Steve Wise; Devesh Sharma; 'Chuck Lever'
> > Cc: 'Hefty, Sean'; 'Roland Dreier'; linux-rdma@xxxxxxxxxxxxxxx
> > Subject: Re: [for-next 1/2] xprtrdma: take reference of rdma provider
> > module
> >
> >
> > On 07/18/2014 06:27 AM, Steve Wise wrote:
> > >>>> We can't really deal with a CM_DEVICE_REMOVE event while there
> > >>>> are active NFS mounts.
> > >>>>
> > >>>> System shutdown ordering should guarantee (one would hope) that
> > NFS
> > >>>> mount points are unmounted before the RDMA/IB core
> infrastructure
> > >>>> is torn down. Ordering shouldn't matter as long all NFS activity
> > >>>> has ceased before the CM tries to remove the device.
> > >>>>
> > >>>> So if something is hanging up the CM, there's something xprtrdma
> > >>>> is not cleaning up properly.
> > >>>>
> > >>>
> > >>>
> > >>> Devesh, how are you reproducing this?  Are you just rmmod'ing the
> > >>> ocrdma module while there are active mounts?
> > >>
> > >> Yes, I am issuing rmmod while there is an active mount. In my case
> > >> rmmod ocrdma remains blocked forever.
> > Where is it blocked?
> >
> > >> Off-the-course of this discussion: Is there a reasoning behind not
> > >> using
> > >> ib_register_client()/ib_unregister_client() framework?
> > >
> > > I think the idea is that you don't need to use it if you are
> > > transport-independent and use the rdmacm...
> > >
> > >
> > >
> > > --
> > > To unsubscribe from this list: send the line "unsubscribe linux-rdma"
> > > in the body of a message to majordomo@xxxxxxxxxxxxxxx More
> > majordomo
> > > info at  http://vger.kernel.org/majordomo-info.html
> > >
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the
> body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at
> http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux