Re: [PATCH v3] rpcrdma: fix handling for RDMA_CM_EVENT_DEVICE_REMOVAL

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, 2024-05-06 at 16:06 -0400, Chuck Lever wrote:
> On Mon, May 06, 2024 at 12:37:59PM +0300, Dan Aloni wrote:
> > Under the scenario of IB device bonding, when bringing down one of
> > the
> > ports, or all ports, we saw xprtrdma entering a non-recoverable
> > state
> > where it is not even possible to complete the disconnect and shut
> > it
> > down the mount, requiring a reboot. Following debug, we saw that
> > transport connect never ended after receiving the
> > RDMA_CM_EVENT_DEVICE_REMOVAL callback.
> > 
> > The DEVICE_REMOVAL callback is irrespective of whether the CM_ID is
> > connected, and ESTABLISHED may not have happened. So need to work
> > with
> > each of these states accordingly.
> > 
> > Fixes: 2acc5cae2923 ('xprtrdma: Prevent dereferencing r_xprt->rx_ep
> > after it is freed')
> > Cc: Sagi Grimberg <sagi.grimberg@xxxxxxxxxxxx>
> > Signed-off-by: Dan Aloni <dan.aloni@xxxxxxxxxxxx>
> > ---
> >  net/sunrpc/xprtrdma/verbs.c | 6 +++++-
> >  1 file changed, 5 insertions(+), 1 deletion(-)
> > 
> > diff --git a/net/sunrpc/xprtrdma/verbs.c
> > b/net/sunrpc/xprtrdma/verbs.c
> > index 4f8d7efa469f..432557a553e7 100644
> > --- a/net/sunrpc/xprtrdma/verbs.c
> > +++ b/net/sunrpc/xprtrdma/verbs.c
> > @@ -244,7 +244,11 @@ rpcrdma_cm_event_handler(struct rdma_cm_id
> > *id, struct rdma_cm_event *event)
> >  	case RDMA_CM_EVENT_DEVICE_REMOVAL:
> >  		pr_info("rpcrdma: removing device %s for
> > %pISpc\n",
> >  			ep->re_id->device->name, sap);
> > -		fallthrough;
> > +		switch (xchg(&ep->re_connect_status, -ENODEV)) {
> > +		case 0: goto wake_connect_worker;
> > +		case 1: goto disconnected;
> > +		}
> > +		return 0;
> >  	case RDMA_CM_EVENT_ADDR_CHANGE:
> >  		ep->re_connect_status = -ENODEV;
> >  		goto disconnected;
> > -- 
> > 2.39.3
> > 
> 
> Hi Anna,
> 
> Please apply this patch with:
> 
> Reviewed-by: Sagi Grimberg <sagi@xxxxxxxxxxx>
> Reviewed-by: Chuck Lever <chuck.lever@xxxxxxxxxx>
> 
> 
Anna is back on leave for a few weeks, so I'll take it.

-- 
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trond.myklebust@xxxxxxxxxxxxxxx






[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux