On Mon, May 06, 2024 at 06:09:51PM +0300, Sagi Grimberg wrote: > > On 06/05/2024 12:37, Dan Aloni wrote: > > Under the scenario of IB device bonding, when bringing down one of the > > ports, or all ports, we saw xprtrdma entering a non-recoverable state > > where it is not even possible to complete the disconnect and shut it > > down the mount, requiring a reboot. Following debug, we saw that > > transport connect never ended after receiving the > > RDMA_CM_EVENT_DEVICE_REMOVAL callback. > > > > The DEVICE_REMOVAL callback is irrespective of whether the CM_ID is > > connected, and ESTABLISHED may not have happened. So need to work with > > each of these states accordingly. > > > > Fixes: 2acc5cae2923 ('xprtrdma: Prevent dereferencing r_xprt->rx_ep after it is freed') > > Is this actually the offending commit ? > > commit bebd031866ca ("xprtrdma: Support unplugging an HCA from under an NFS > mount") > is the one assuming DEVICE_REMOVAL triggers a disconnect not accounting that > the > cm_id may not be ESTABLISHED (where we need to wake the connect waiter? I'd be OK with discussing possible culprits in the patch description but leaving off a Fixes: tag for now. It would be reasonable to demand that the proposed fix be applied to each LTS kernel and tested individually to ensure there are no side-effects or pre-requisites. > Question though, in DEVICE_REMOVAL the device is going away as soon as the > cm handler callback returns. Shouldn't nfs release all the device resources > (related to this > cm_id)? afaict it was changed in: > e28ce90083f0 ("xprtrdma: kmalloc rpcrdma_ep separate from rpcrdma_xprt") In the case where a DEVICE_REMOVAL event fires and a connection hasn't yet been established, my guess is the ep reference count will go to zero when rpcrdma_ep_put() is called. > The patch itself looks reasonable (although I do think that the rdma stack > expects the > ulp to have the rdma resources released when the callback returns). Thanks for the review! Should we add: Reviewed-by: Sagi Grimberg <sagi@xxxxxxxxxxx> > FWIW in nvme we avoided the problem altogether by registering an ib_client > that is > called on .remove() and its a separate context that doesn't have all the > intricacies with > rdma_cm... I looked at ib_client, years ago, and thought it would be a lot of added complexity. With a code sample (NVMe host) maybe I can put something together. -- Chuck Lever