On Thu, 2015-09-17 at 10:18 -0400, Jeff Layton wrote: > On Thu, 17 Sep 2015 09:38:33 -0400 > Trond Myklebust <trond.myklebust@xxxxxxxxxxxxxxx> wrote: > > > On Tue, Sep 15, 2015 at 2:52 PM, Jeff Layton < > > jlayton@xxxxxxxxxxxxxxx> wrote: > > > On Tue, 15 Sep 2015 16:49:23 +0100 > > > "Suzuki K. Poulose" <suzuki.poulose@xxxxxxx> wrote: > > > > > > > net/sunrpc/xprtsock.c | 9 ++++++++- > > > > 1 file changed, 8 insertions(+), 1 deletion(-) > > > > > > > > diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c > > > > index 7be90bc..6f4789d 100644 > > > > --- a/net/sunrpc/xprtsock.c > > > > +++ b/net/sunrpc/xprtsock.c > > > > @@ -822,9 +822,16 @@ static void xs_reset_transport(struct > > > > sock_xprt *transport) > > > > if (atomic_read(&transport->xprt.swapper)) > > > > sk_clear_memalloc(sk); > > > > > > > > - kernel_sock_shutdown(sock, SHUT_RDWR); > > > > + if (sock) > > > > + kernel_sock_shutdown(sock, SHUT_RDWR); > > > > > > > > > > Good catch, but...isn't this still racy? What prevents transport > > > ->sock > > > being set to NULL after you assign it to "sock" but before > > > calling > > > kernel_sock_shutdown? > > > > The XPRT_LOCKED state. > > > > IDGI -- if the XPRT_LOCKED bit was supposed to prevent that, then > how could you hit the original race? There should be no concurrent > callers to xs_reset_transport on the same xprt, right? Correct. The only exception is xs_destroy. > AFAICT, that bit is not set in the xprt_destroy codepath, which may > be > the root cause of the problem. How would we take it there anyway? > xprt_destroy is void return, and may not be called in the context of > a > rpc_task. If it's contended, what do we do? Sleep until it's > cleared? > How about the following. 8<----------------------------------------------------------------- >From e2e68218e66c6b0715fd6b8f1b3092694a7c0e62 Mon Sep 17 00:00:00 2001 From: Trond Myklebust <trond.myklebust@xxxxxxxxxxxxxxx> Date: Thu, 17 Sep 2015 10:42:27 -0400 Subject: [PATCH] SUNRPC: Fix races between socket connection and destroy code When we're destroying the socket transport, we need to ensure that we cancel any existing delayed connection attempts, and order them w.r.t. the call to xs_close(). Signed-off-by: Trond Myklebust <trond.myklebust@xxxxxxxxxxxxxxx> --- net/sunrpc/xprtsock.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c index 7be90bc1a7c2..d2dfbd043bea 100644 --- a/net/sunrpc/xprtsock.c +++ b/net/sunrpc/xprtsock.c @@ -881,8 +881,11 @@ static void xs_xprt_free(struct rpc_xprt *xprt) */ static void xs_destroy(struct rpc_xprt *xprt) { + struct sock_xprt *transport = container_of(xprt, + struct sock_xprt, xprt); dprintk("RPC: xs_destroy xprt %p\n", xprt); + cancel_delayed_work_sync(&transport->connect_worker); xs_close(xprt); xs_xprt_free(xprt); module_put(THIS_MODULE); -- 2.4.3 -- Trond Myklebust Linux NFS client maintainer, PrimaryData trond.myklebust@xxxxxxxxxxxxxxx -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html