Re: [PATCH v3] rpcrdma: fix handling for RDMA_CM_EVENT_DEVICE_REMOVAL

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, May 06, 2024 at 10:10:12PM +0300, Sagi Grimberg wrote:
> 
> On 06/05/2024 18:55, Chuck Lever wrote:
> > On Mon, May 06, 2024 at 06:09:51PM +0300, Sagi Grimberg wrote:
> > > Question though, in DEVICE_REMOVAL the device is going away as soon as the
> > > cm handler callback returns. Shouldn't nfs release all the device resources
> > > (related to this
> > > cm_id)? afaict it was changed in:
> > > e28ce90083f0 ("xprtrdma: kmalloc rpcrdma_ep separate from rpcrdma_xprt")
> > In the case where a DEVICE_REMOVAL event fires and a connection
> > hasn't yet been established, my guess is the ep reference count will
> > go to zero when rpcrdma_ep_put() is called.
> 
> Yes, I was actually referring to the case where the connection was
> established.
> It looks like rpcrdma_force_disconnect -> xprt_force_disconnect schedules
> async
> work to tear things down no?

Yes because we can't rip out the hardware resources while the
RPC client is still using the transport. Converting to use an
ib_client might help by first triggering a disconnect, done
with proper serialization.


> > > FWIW in nvme we avoided the problem altogether by registering an ib_client
> > > that is
> > > called on .remove() and its a separate context that doesn't have all the
> > > intricacies with
> > > rdma_cm...
> > I looked at ib_client, years ago, and thought it would be a lot of
> > added complexity. With a code sample (NVMe host) maybe I can put
> > something together.
> 
> The plus is that there is no need to handle the DEVICE_REMOVAL cm event,
> which is always nice...

I'll have a look, thanks for the suggestion.


-- 
Chuck Lever




[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux