Re: [for-next 1/2] xprtrdma: take reference of rdma provider module

Chuck Lever <chuck.lever@xxxxxxxxxx> · Mon, 21 Jul 2014 11:20:34 -0400

On Jul 21, 2014, at 11:03 AM, Steve Wise <swise@xxxxxxxxxxxxxxxxxxxxx> wrote:

> 
> 
>> -----Original Message-----
>> From: Chuck Lever [mailto:chuck.lever@xxxxxxxxxx]
>> Sent: Monday, July 21, 2014 9:54 AM
>> To: Devesh Sharma
>> Cc: Shirley Ma; Steve Wise; Hefty, Sean; Roland Dreier; linux-rdma@xxxxxxxxxxxxxxx
>> Subject: Re: [for-next 1/2] xprtrdma: take reference of rdma provider module
>> 
>> Hi Devesh-
>> 
>> Thanks for drilling into this further.
>> 
>> On Jul 21, 2014, at 7:48 AM, Devesh Sharma <Devesh.Sharma@xxxxxxxxxx> wrote:
>> 
>>> In rpcrdma_ep_connect():
>>> 
>>> write_lock(&ia->ri_qplock);
>>>               old = ia->ri_id;
>>>               ia->ri_id = id;
>>>               write_unlock(&ia->ri_qplock);
>>> 
>>>               rdma_destroy_qp(old);
>>>               rdma_destroy_id(old);  =============> Cm -id is destroyed here.
>>> 
>>> 
>>> If following code fails in rpcrdma_ep_connect():
>>> id = rpcrdma_create_id(xprt, ia,
>>>                               (struct sockaddr *)&xprt->rx_data.addr);
>>>               if (IS_ERR(id)) {
>>>                       rc = -EHOSTUNREACH;
>>>                       goto out;
>>>               }
>>> 
>>> it leaves old cm-id still alive. This will always fail if Device is removed abruptly.
>> 
>> For CM_EVENT_DEVICE_REMOVAL, rpcrdma_conn_upcall() sets ep->rep_connected
>> to -ENODEV.
>> 
>> Then:
>> 
>> 929 int
>> 930 rpcrdma_ep_connect(struct rpcrdma_ep *ep, struct rpcrdma_ia *ia)
>> 931 {
>> 932         struct rdma_cm_id *id, *old;
>> 933         int rc = 0;
>> 934         int retry_count = 0;
>> 935
>> 936         if (ep->rep_connected != 0) {
>> 937                 struct rpcrdma_xprt *xprt;
>> 938 retry:
>> 939                 dprintk("RPC:       %s: reconnecting...\n", __func__);
>> 
>> ep->rep_connected is probably -ENODEV after a device removal. It would be
>> possible for the connect worker to destroy everything associated with this
>> connection in that case to ensure the underlying object reference counts
>> are cleared.
>> 
>> The immediate danger is that if there are pending RPCs, they could exit while
>> qp/cm_id are NULL, triggering a panic in rpcrdma_deregister_frmr_external().
>> Checking for NULL pointers inside the ri_qplock would prevent that.
>> 
>> However, NFS mounts via this adapter will hang indefinitely after all
>> transports are torn down and the adapter is gone. The only thing that can be
>> done is something drastic like "echo b > /proc/sysrq_trigger" on the client.
>> 
>> Thus, IMO hot-plugging or passive fail-over are the only scenarios where
>> this makes sense. If we have an immediate problem here, is it a problem with
>> system shutdown ordering that can be addressed in some other way?
>> 
>> Until that support is in place, obviously I would prefer that the removal of
>> the underlying driver be prevented while there are NFS mounts in place. I
>> think that's what NFS users have come to expect.
>> 
>> In other words, don't allow device removal until we have support for device
>> insertion :-)
>> 
>> 
> 
> 
> If we fix the above problems on provider unload, shouldn't the mount recover if the
> provider module is subsequently loaded?  Or another provider configured such that
> rdma_resolve_addr/route() then picks an active device?

On device removal, we have to destroy everything.

On insertion, we’ll have to create a fresh PD and MRs, which isn’t done now
during reconnect. So, insertion needs work too.

--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html