> -----Original Message----- > From: Shirley Ma [mailto:shirley.ma@xxxxxxxxxx] > Sent: Thursday, July 17, 2014 1:58 PM > To: Hefty, Sean; Steve Wise; Devesh Sharma; Roland Dreier > Cc: linux-rdma@xxxxxxxxxxxxxxx; chuck.lever@xxxxxxxxxx > Subject: Re: [for-next 1/2] xprtrdma: take reference of rdma provider module > > > > On 07/17/2014 09:06 AM, Hefty, Sean wrote: > >> On 7/17/2014 9:01 AM, Devesh Sharma wrote: > >>> If verndor driver is attempted for removal while xprtrdma still has an > >>> active mount, the removal of driver may never complete and can cause > >>> unseen races or in worst case system crash. > >>> > >>> To solve this, xprtrdma module should get reference of struct ib_device > >>> structure for every mount. Reference is taken after local device address > >>> resolution is completed successfuly. > >>> > >>> reference to the struct ib_device pointer is put just before cm_id > >> destruction. > >>> > >>> Signed-off-by: Devesh Sharma <devesh.sharma@xxxxxxxxxx> > >> > >> This seems like an issue with the rdma-cm or rdma core, not xprtrdma. I > >> see that user rdma applications cause a ref on the provider module here > >> in ib_uverbs_open(): > >> > >> if (!try_module_get(dev->ib_dev->owner)) { > >> ret = -ENODEV; > >> goto err; > >> > >> > >> Maybe kernel applications that allocate device resources should cause a > >> ref on the provider's module. > >> > >> Sean/Roland, is there some history here as to how rdma provider module > >> removal should be handled? > > > > The kernel modules should are not expected to access the rdma devices after their > remove device callback has been invoked. The rdma cm basically forwards the device > removal on a per id basis. Apps are expected to destroy the id after receiving that callback. > The rdma cm should block in the remove device call until all id's associated with the > removed device have been destroyed. > > So the rdma cm is expected to increase the driver reference count (try_module_get) for > each new cm id, then deference count (module_put) when cm id is destroyed? > No, I think he's saying the rdma-cm posts a RDMA_CM_DEVICE_REMOVAL event to each application with rdmacm objects allocated, and each application is expected to destroy all the objects it has allocated before returning from the event handler. And I think the ib_verbs core calls each ib_client's remove handler when an rdma provider unregisters with the core. Steve. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html