Hi Dan- > On Jun 20, 2020, at 1:18 PM, Dan Aloni <dan@xxxxxxxxxxxx> wrote: > > Given that rpcrdma_xprt_connect() happens from workqueue context, on cases where > connections don't succeeds, something needs to wake it up. In my case, this has > been observed when the CM callback received `RDMA_CM_EVENT_REJECTED`, and > `rpcrdma_xprt_connect()` slept forever. Interesting. My development and testing generates plenty of REJECTED connection requests, but I never saw this particular failure mode. > This continues the fix in commit 58bd6656f808 ('xprtrdma: Restore wake-up-all to > rpcrdma_cm_event_handler()'). The patch looks sensible. I'll pull it into my test harness. > Signed-off-by: Dan Aloni <dan@xxxxxxxxxxxx> > CC: Chuck Lever <chuck.lever@xxxxxxxxxx> > --- > > Notes: > Hi Chuck, > > Maybe I missd something, as it is not clear to me how otherwise (without this > patch), re_connect_wait can be woken up in this situation. Please explain? > > net/sunrpc/xprtrdma/verbs.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c > index 2ae348377806..8bd76a47a91f 100644 > --- a/net/sunrpc/xprtrdma/verbs.c > +++ b/net/sunrpc/xprtrdma/verbs.c > @@ -289,6 +289,7 @@ rpcrdma_cm_event_handler(struct rdma_cm_id *id, struct rdma_cm_event *event) > ep->re_connect_status = -ECONNABORTED; > disconnected: > xprt_force_disconnect(xprt); > + wake_up_all(&ep->re_connect_wait); > return rpcrdma_ep_destroy(ep); > default: > break; > -- > 2.25.4 > -- Chuck Lever