Re: [for-next 1/2] xprtrdma: take reference of rdma provider module

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Jul 17, 2014, at 4:08 PM, Steve Wise <swise@xxxxxxxxxxxxxxxxxxxxx> wrote:

> 
> 
>> -----Original Message-----
>> From: Steve Wise [mailto:swise@xxxxxxxxxxxxxxxxxxxxx]
>> Sent: Thursday, July 17, 2014 2:56 PM
>> To: 'Hefty, Sean'; 'Shirley Ma'; 'Devesh Sharma'; 'Roland Dreier'
>> Cc: 'linux-rdma@xxxxxxxxxxxxxxx'; 'chuck.lever@xxxxxxxxxx'
>> Subject: RE: [for-next 1/2] xprtrdma: take reference of rdma provider module
>> 
>> 
>> 
>>> -----Original Message-----
>>> From: Hefty, Sean [mailto:sean.hefty@xxxxxxxxx]
>>> Sent: Thursday, July 17, 2014 2:50 PM
>>> To: Steve Wise; 'Shirley Ma'; 'Devesh Sharma'; 'Roland Dreier'
>>> Cc: linux-rdma@xxxxxxxxxxxxxxx; chuck.lever@xxxxxxxxxx
>>> Subject: RE: [for-next 1/2] xprtrdma: take reference of rdma provider module
>>> 
>>>>> So the rdma cm is expected to increase the driver reference count
>>>> (try_module_get) for
>>>>> each new cm id, then deference count (module_put) when cm id is
>>>> destroyed?
>>>>> 
>>>> 
>>>> No, I think he's saying the rdma-cm posts a RDMA_CM_DEVICE_REMOVAL event
>>>> to each
>>>> application with rdmacm objects allocated, and each application is expected
>>>> to destroy all
>>>> the objects it has allocated before returning from the event handler.
>>> 
>>> This is almost correct.  The applications do not have to destroy all the objects that
> it has
>>> allocated before returning from their event handler.  E.g. an app can queue a work
> item
>>> that does the destruction.  The rdmacm will block in its ib_client remove handler
> until all
>>> relevant rdma_cm_id's have been destroyed.
>>> 
>> 
>> Thanks for the clarification.
>> 
> 
> And looking at xprtrdma, it does handle the DEVICE_REMOVAL event in rpcrdma_conn_upcall().
> It sets ep->rep_connected to -ENODEV, wakes everybody up, and calls rpcrdma_conn_func()
> for that endpoint, which schedules rep_connect_worker...  and I gave up following the code
> path at this point... :)  
> 
> For this to all work correctly, it would need to destroy all the QPs, MRs, CQs, etc for
> that device _before_ destroying the rdma cm ids.  Otherwise the provider module could be
> unloaded too soon…

We can’t really deal with a CM_DEVICE_REMOVE event while there are active
NFS mounts.

System shutdown ordering should guarantee (one would hope) that NFS
mount points are unmounted before the RDMA/IB core infrastructure is
torn down. Ordering shouldn’t matter as long all NFS activity has
ceased before the CM tries to remove the device.

So if something is hanging up the CM, there’s something xprtrdma is not
cleaning up properly.

--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com



--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux