On 07/17/2014 01:41 PM, Chuck Lever wrote: > On Jul 17, 2014, at 4:08 PM, Steve Wise <swise@xxxxxxxxxxxxxxxxxxxxx> wrote: > >> > >> > >>> >> -----Original Message----- >>> >> From: Steve Wise [mailto:swise@xxxxxxxxxxxxxxxxxxxxx] >>> >> Sent: Thursday, July 17, 2014 2:56 PM >>> >> To: 'Hefty, Sean'; 'Shirley Ma'; 'Devesh Sharma'; 'Roland Dreier' >>> >> Cc: 'linux-rdma@xxxxxxxxxxxxxxx'; 'chuck.lever@xxxxxxxxxx' >>> >> Subject: RE: [for-next 1/2] xprtrdma: take reference of rdma provider module >>> >> >>> >> >>> >> >>>> >>> -----Original Message----- >>>> >>> From: Hefty, Sean [mailto:sean.hefty@xxxxxxxxx] >>>> >>> Sent: Thursday, July 17, 2014 2:50 PM >>>> >>> To: Steve Wise; 'Shirley Ma'; 'Devesh Sharma'; 'Roland Dreier' >>>> >>> Cc: linux-rdma@xxxxxxxxxxxxxxx; chuck.lever@xxxxxxxxxx >>>> >>> Subject: RE: [for-next 1/2] xprtrdma: take reference of rdma provider module >>>> >>> >>>>>> >>>>> So the rdma cm is expected to increase the driver reference count >>>>> >>>> (try_module_get) for >>>>>> >>>>> each new cm id, then deference count (module_put) when cm id is >>>>> >>>> destroyed? >>>>>> >>>>> >>>>> >>>> >>>>> >>>> No, I think he's saying the rdma-cm posts a RDMA_CM_DEVICE_REMOVAL event >>>>> >>>> to each >>>>> >>>> application with rdmacm objects allocated, and each application is expected >>>>> >>>> to destroy all >>>>> >>>> the objects it has allocated before returning from the event handler. >>>> >>> >>>> >>> This is almost correct. The applications do not have to destroy all the objects that >> > it has >>>> >>> allocated before returning from their event handler. E.g. an app can queue a work >> > item >>>> >>> that does the destruction. The rdmacm will block in its ib_client remove handler >> > until all >>>> >>> relevant rdma_cm_id's have been destroyed. >>>> >>> >>> >> >>> >> Thanks for the clarification. >>> >> >> > >> > And looking at xprtrdma, it does handle the DEVICE_REMOVAL event in rpcrdma_conn_upcall(). >> > It sets ep->rep_connected to -ENODEV, wakes everybody up, and calls rpcrdma_conn_func() >> > for that endpoint, which schedules rep_connect_worker... and I gave up following the code >> > path at this point... :) >> > >> > For this to all work correctly, it would need to destroy all the QPs, MRs, CQs, etc for >> > that device _before_ destroying the rdma cm ids. Otherwise the provider module could be >> > unloaded too soon… > We can’t really deal with a CM_DEVICE_REMOVE event while there are active > NFS mounts. > > System shutdown ordering should guarantee (one would hope) that NFS > mount points are unmounted before the RDMA/IB core infrastructure is > torn down. Ordering shouldn’t matter as long all NFS activity has > ceased before the CM tries to remove the device. > > So if something is hanging up the CM, there’s something xprtrdma is not > cleaning up properly. I saw a problem once, restart the system without umounting the NFS. CM was hung on waiting for completion. It looks like a bug in xprtrdma cleanup up. I couldn't reproduce it. Call Trace: [<ffffffff815c9aa9>] schedule+0x29/0x70 [<ffffffff815c8d35>] schedule_timeout+0x165/0x200 [<ffffffff815ca9ff>] ? wait_for_completion+0xcf/0x110 [<ffffffff810a708e>] ? __lock_release+0x9e/0x1f0 [<ffffffff815ca9ff>] ? wait_for_completion+0xcf/0x110 [<ffffffff815caa07>] wait_for_completion+0xd7/0x110 [<ffffffff8108bce0>] ? try_to_wake_up+0x260/0x260 [<ffffffffa064cb6e>] cma_process_remove+0xee/0x110 [rdma_cm] [<ffffffffa064cbdc>] cma_remove_one+0x4c/0x60 [rdma_cm] [<ffffffffa0279e0f>] ib_unregister_device+0x4f/0x100 [ib_core] [<ffffffffa02f76ee>] mlx4_ib_remove+0x2e/0x260 [mlx4_ib] [<ffffffffa01754c9>] mlx4_remove_device+0x69/0x80 [mlx4_core] [<ffffffffa01755b3>] mlx4_unregister_interface+0x43/0x80 [mlx4_core] [<ffffffffa030970c>] mlx4_ib_cleanup+0x10/0x23 [mlx4_ib] [<ffffffff810d9183>] SyS_delete_module+0x183/0x1e0 [<ffffffff810f7c94>] ? __audit_syscall_entry+0x94/0x100 [<ffffffff812c5789>] ? lockdep_sys_exit_thunk+0x35/0x67 [<ffffffff815cec92>] system_call_fastpath+0x16/0x1b Shirley -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html