Re: [for-next 1/2] xprtrdma: take reference of rdma provider module

Shirley Ma <shirley.ma@xxxxxxxxxx> · Thu, 17 Jul 2014 14:25:14 -0700

On 07/17/2014 01:41 PM, Chuck Lever wrote:
> On Jul 17, 2014, at 4:08 PM, Steve Wise <swise@xxxxxxxxxxxxxxxxxxxxx> wrote:
> 
>> > 
>> > 
>>> >> -----Original Message-----
>>> >> From: Steve Wise [mailto:swise@xxxxxxxxxxxxxxxxxxxxx]
>>> >> Sent: Thursday, July 17, 2014 2:56 PM
>>> >> To: 'Hefty, Sean'; 'Shirley Ma'; 'Devesh Sharma'; 'Roland Dreier'
>>> >> Cc: 'linux-rdma@xxxxxxxxxxxxxxx'; 'chuck.lever@xxxxxxxxxx'
>>> >> Subject: RE: [for-next 1/2] xprtrdma: take reference of rdma provider module
>>> >> 
>>> >> 
>>> >> 
>>>> >>> -----Original Message-----
>>>> >>> From: Hefty, Sean [mailto:sean.hefty@xxxxxxxxx]
>>>> >>> Sent: Thursday, July 17, 2014 2:50 PM
>>>> >>> To: Steve Wise; 'Shirley Ma'; 'Devesh Sharma'; 'Roland Dreier'
>>>> >>> Cc: linux-rdma@xxxxxxxxxxxxxxx; chuck.lever@xxxxxxxxxx
>>>> >>> Subject: RE: [for-next 1/2] xprtrdma: take reference of rdma provider module
>>>> >>> 
>>>>>> >>>>> So the rdma cm is expected to increase the driver reference count
>>>>> >>>> (try_module_get) for
>>>>>> >>>>> each new cm id, then deference count (module_put) when cm id is
>>>>> >>>> destroyed?
>>>>>> >>>>> 
>>>>> >>>> 
>>>>> >>>> No, I think he's saying the rdma-cm posts a RDMA_CM_DEVICE_REMOVAL event
>>>>> >>>> to each
>>>>> >>>> application with rdmacm objects allocated, and each application is expected
>>>>> >>>> to destroy all
>>>>> >>>> the objects it has allocated before returning from the event handler.
>>>> >>> 
>>>> >>> This is almost correct.  The applications do not have to destroy all the objects that
>> > it has
>>>> >>> allocated before returning from their event handler.  E.g. an app can queue a work
>> > item
>>>> >>> that does the destruction.  The rdmacm will block in its ib_client remove handler
>> > until all
>>>> >>> relevant rdma_cm_id's have been destroyed.
>>>> >>> 
>>> >> 
>>> >> Thanks for the clarification.
>>> >> 
>> > 
>> > And looking at xprtrdma, it does handle the DEVICE_REMOVAL event in rpcrdma_conn_upcall().
>> > It sets ep->rep_connected to -ENODEV, wakes everybody up, and calls rpcrdma_conn_func()
>> > for that endpoint, which schedules rep_connect_worker...  and I gave up following the code
>> > path at this point... :)  
>> > 
>> > For this to all work correctly, it would need to destroy all the QPs, MRs, CQs, etc for
>> > that device _before_ destroying the rdma cm ids.  Otherwise the provider module could be
>> > unloaded too soon…
> We can’t really deal with a CM_DEVICE_REMOVE event while there are active
> NFS mounts.
> 
> System shutdown ordering should guarantee (one would hope) that NFS
> mount points are unmounted before the RDMA/IB core infrastructure is
> torn down. Ordering shouldn’t matter as long all NFS activity has
> ceased before the CM tries to remove the device.
> 
> So if something is hanging up the CM, there’s something xprtrdma is not
> cleaning up properly.

I saw a problem once, restart the system without umounting the NFS. CM was hung on waiting for completion. It looks like a  bug in xprtrdma cleanup up. I couldn't reproduce it.

Call Trace:
 [<ffffffff815c9aa9>] schedule+0x29/0x70
 [<ffffffff815c8d35>] schedule_timeout+0x165/0x200
 [<ffffffff815ca9ff>] ? wait_for_completion+0xcf/0x110
 [<ffffffff810a708e>] ? __lock_release+0x9e/0x1f0
 [<ffffffff815ca9ff>] ? wait_for_completion+0xcf/0x110
 [<ffffffff815caa07>] wait_for_completion+0xd7/0x110
 [<ffffffff8108bce0>] ? try_to_wake_up+0x260/0x260
 [<ffffffffa064cb6e>] cma_process_remove+0xee/0x110 [rdma_cm]
 [<ffffffffa064cbdc>] cma_remove_one+0x4c/0x60 [rdma_cm]
 [<ffffffffa0279e0f>] ib_unregister_device+0x4f/0x100 [ib_core]
 [<ffffffffa02f76ee>] mlx4_ib_remove+0x2e/0x260 [mlx4_ib]
 [<ffffffffa01754c9>] mlx4_remove_device+0x69/0x80 [mlx4_core]
 [<ffffffffa01755b3>] mlx4_unregister_interface+0x43/0x80 [mlx4_core]
 [<ffffffffa030970c>] mlx4_ib_cleanup+0x10/0x23 [mlx4_ib]
 [<ffffffff810d9183>] SyS_delete_module+0x183/0x1e0
 [<ffffffff810f7c94>] ? __audit_syscall_entry+0x94/0x100
 [<ffffffff812c5789>] ? lockdep_sys_exit_thunk+0x35/0x67
 [<ffffffff815cec92>] system_call_fastpath+0x16/0x1b

Shirley
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html