Re: [for-next 1/2] xprtrdma: take reference of rdma provider module

Chuck Lever <chuck.lever@xxxxxxxxxx> · Mon, 21 Jul 2014 10:53:50 -0400

Hi Devesh-

Thanks for drilling into this further.

On Jul 21, 2014, at 7:48 AM, Devesh Sharma <Devesh.Sharma@xxxxxxxxxx> wrote:

> In rpcrdma_ep_connect():
> 
> write_lock(&ia->ri_qplock);
>                old = ia->ri_id;
>                ia->ri_id = id;
>                write_unlock(&ia->ri_qplock);
> 
>                rdma_destroy_qp(old);
>                rdma_destroy_id(old);  =============> Cm -id is destroyed here.
> 
> 
> If following code fails in rpcrdma_ep_connect():
> id = rpcrdma_create_id(xprt, ia,
>                                (struct sockaddr *)&xprt->rx_data.addr);
>                if (IS_ERR(id)) {
>                        rc = -EHOSTUNREACH;
>                        goto out;
>                }
> 
> it leaves old cm-id still alive. This will always fail if Device is removed abruptly.

For CM_EVENT_DEVICE_REMOVAL, rpcrdma_conn_upcall() sets ep->rep_connected
to -ENODEV.

Then:

 929 int
 930 rpcrdma_ep_connect(struct rpcrdma_ep *ep, struct rpcrdma_ia *ia)
 931 {
 932         struct rdma_cm_id *id, *old;
 933         int rc = 0;
 934         int retry_count = 0;
 935 
 936         if (ep->rep_connected != 0) {
 937                 struct rpcrdma_xprt *xprt;
 938 retry:
 939                 dprintk("RPC:       %s: reconnecting...\n", __func__);

ep->rep_connected is probably -ENODEV after a device removal. It would be
possible for the connect worker to destroy everything associated with this
connection in that case to ensure the underlying object reference counts
are cleared.

The immediate danger is that if there are pending RPCs, they could exit while
qp/cm_id are NULL, triggering a panic in rpcrdma_deregister_frmr_external().
Checking for NULL pointers inside the ri_qplock would prevent that.

However, NFS mounts via this adapter will hang indefinitely after all
transports are torn down and the adapter is gone. The only thing that can be
done is something drastic like “echo b > /proc/sysrq_trigger” on the client.

Thus, IMO hot-plugging or passive fail-over are the only scenarios where
this makes sense. If we have an immediate problem here, is it a problem with
system shutdown ordering that can be addressed in some other way?

Until that support is in place, obviously I would prefer that the removal of
the underlying driver be prevented while there are NFS mounts in place. I
think that’s what NFS users have come to expect.

In other words, don’t allow device removal until we have support for device
insertion :-)

> In rdma_resolve_addr()/rdma_destroy_id() cm_dev is referenced/de-referenced here (cma.c):
> 
> static int cma_acquire_dev(struct rdma_id_private *id_priv,
>                           struct rdma_id_private *listen_id_priv) {
> .
> .
> if (!ret)
>                cma_attach_to_dev(id_priv, cma_dev);
> }
> 
> static void cma_release_dev(struct rdma_id_private *id_priv)
> {
>        mutex_lock(&lock);
>        list_del(&id_priv->list);
>        cma_deref_dev(id_priv->cma_dev);
> .
> .
> }
> 
> Since as per design of nfs-rdma at-least previously known good cm-id always remains live utill
> another good cm-id is created, cma_dev->refcount never becomes 0 upon device removal .
> Thus blocking the rmmod <vendor driver> forever.
> 
> -Regards
> Devesh
> 
>> -----Original Message-----
>> From: linux-rdma-owner@xxxxxxxxxxxxxxx [mailto:linux-rdma-
>> owner@xxxxxxxxxxxxxxx] On Behalf Of Devesh Sharma
>> Sent: Monday, July 21, 2014 11:42 AM
>> To: Shirley Ma; Steve Wise; 'Chuck Lever'
>> Cc: 'Hefty, Sean'; 'Roland Dreier'; linux-rdma@xxxxxxxxxxxxxxx
>> Subject: RE: [for-next 1/2] xprtrdma: take reference of rdma provider
>> module
>> 
>> Shirley,
>> 
>> Once rmmod is issued, the connection corresponding to the active mount is
>> destroyed and all the associated resources Are freed. As per the processing
>> logic of DEVICE-REMOVAL event, nfs-rdma wakes-up all the  waiters, This
>> results into Re-establishment efforts, since the device is not present any
>> more, rdma_resolve_address() fails with CM resolution Error. This loop
>> continues forever.
>> 
>> I am yet to find out which part of ocrdma is blocked. I am putting some debug
>> messages to find it out. I will get back to The group with an update.
>> 
>> -Regards
>> Devesh
>> 
>>> -----Original Message-----
>>> From: Shirley Ma [mailto:shirley.ma@xxxxxxxxxx]
>>> Sent: Friday, July 18, 2014 9:18 PM
>>> To: Steve Wise; Devesh Sharma; 'Chuck Lever'
>>> Cc: 'Hefty, Sean'; 'Roland Dreier'; linux-rdma@xxxxxxxxxxxxxxx
>>> Subject: Re: [for-next 1/2] xprtrdma: take reference of rdma provider
>>> module
>>> 
>>> 
>>> On 07/18/2014 06:27 AM, Steve Wise wrote:
>>>>>>> We can't really deal with a CM_DEVICE_REMOVE event while there
>>>>>>> are active NFS mounts.
>>>>>>> 
>>>>>>> System shutdown ordering should guarantee (one would hope) that
>>> NFS
>>>>>>> mount points are unmounted before the RDMA/IB core
>> infrastructure
>>>>>>> is torn down. Ordering shouldn't matter as long all NFS activity
>>>>>>> has ceased before the CM tries to remove the device.
>>>>>>> 
>>>>>>> So if something is hanging up the CM, there's something xprtrdma
>>>>>>> is not cleaning up properly.
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> Devesh, how are you reproducing this?  Are you just rmmod'ing the
>>>>>> ocrdma module while there are active mounts?
>>>>> 
>>>>> Yes, I am issuing rmmod while there is an active mount. In my case
>>>>> rmmod ocrdma remains blocked forever.
>>> Where is it blocked?
>>> 
>>>>> Off-the-course of this discussion: Is there a reasoning behind not
>>>>> using
>>>>> ib_register_client()/ib_unregister_client() framework?
>>>> 
>>>> I think the idea is that you don't need to use it if you are
>>>> transport-independent and use the rdmacm...
>>>> 
>>>> 
>>>> 
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe linux-rdma"
>>>> in the body of a message to majordomo@xxxxxxxxxxxxxxx More
>>> majordomo
>>>> info at  http://vger.kernel.org/majordomo-info.html
>>>> 
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the
>> body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at
>> http://vger.kernel.org/majordomo-info.html

--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html