On Mon, May 06, 2024 at 12:37:59PM +0300, Dan Aloni wrote: > Under the scenario of IB device bonding, when bringing down one of the > ports, or all ports, we saw xprtrdma entering a non-recoverable state > where it is not even possible to complete the disconnect and shut it > down the mount, requiring a reboot. Following debug, we saw that > transport connect never ended after receiving the > RDMA_CM_EVENT_DEVICE_REMOVAL callback. > > The DEVICE_REMOVAL callback is irrespective of whether the CM_ID is > connected, and ESTABLISHED may not have happened. So need to work with > each of these states accordingly. > > Fixes: 2acc5cae2923 ('xprtrdma: Prevent dereferencing r_xprt->rx_ep after it is freed') > Cc: Sagi Grimberg <sagi.grimberg@xxxxxxxxxxxx> > Signed-off-by: Dan Aloni <dan.aloni@xxxxxxxxxxxx> > --- > net/sunrpc/xprtrdma/verbs.c | 6 +++++- > 1 file changed, 5 insertions(+), 1 deletion(-) > > diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c > index 4f8d7efa469f..432557a553e7 100644 > --- a/net/sunrpc/xprtrdma/verbs.c > +++ b/net/sunrpc/xprtrdma/verbs.c > @@ -244,7 +244,11 @@ rpcrdma_cm_event_handler(struct rdma_cm_id *id, struct rdma_cm_event *event) > case RDMA_CM_EVENT_DEVICE_REMOVAL: > pr_info("rpcrdma: removing device %s for %pISpc\n", > ep->re_id->device->name, sap); > - fallthrough; > + switch (xchg(&ep->re_connect_status, -ENODEV)) { > + case 0: goto wake_connect_worker; > + case 1: goto disconnected; > + } > + return 0; > case RDMA_CM_EVENT_ADDR_CHANGE: > ep->re_connect_status = -ENODEV; > goto disconnected; > -- > 2.39.3 > Hi Anna, Please apply this patch with: Reviewed-by: Sagi Grimberg <sagi@xxxxxxxxxxx> Reviewed-by: Chuck Lever <chuck.lever@xxxxxxxxxx> -- Chuck Lever