Re: [PATCH] SUNRPC: Fix a race in xs_reset_transport

Trond Myklebust <trond.myklebust@xxxxxxxxxxxxxxx> · Thu, 17 Sep 2015 10:50:01 -0400

On Thu, 2015-09-17 at 10:18 -0400, Jeff Layton wrote:
> On Thu, 17 Sep 2015 09:38:33 -0400
> Trond Myklebust <trond.myklebust@xxxxxxxxxxxxxxx> wrote:
> 
> > On Tue, Sep 15, 2015 at 2:52 PM, Jeff Layton <
> > jlayton@xxxxxxxxxxxxxxx> wrote:
> > > On Tue, 15 Sep 2015 16:49:23 +0100
> > > "Suzuki K. Poulose" <suzuki.poulose@xxxxxxx> wrote:
> > > 
> > > >  net/sunrpc/xprtsock.c |    9 ++++++++-
> > > >  1 file changed, 8 insertions(+), 1 deletion(-)
> > > > 
> > > > diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c
> > > > index 7be90bc..6f4789d 100644
> > > > --- a/net/sunrpc/xprtsock.c
> > > > +++ b/net/sunrpc/xprtsock.c
> > > > @@ -822,9 +822,16 @@ static void xs_reset_transport(struct
> > > > sock_xprt *transport)
> > > >       if (atomic_read(&transport->xprt.swapper))
> > > >               sk_clear_memalloc(sk);
> > > > 
> > > > -     kernel_sock_shutdown(sock, SHUT_RDWR);
> > > > +     if (sock)
> > > > +             kernel_sock_shutdown(sock, SHUT_RDWR);
> > > > 
> > > 
> > > Good catch, but...isn't this still racy? What prevents transport
> > > ->sock
> > > being set to NULL after you assign it to "sock" but before
> > > calling
> > > kernel_sock_shutdown?
> > 
> > The XPRT_LOCKED state.
> > 
> 
> IDGI -- if the XPRT_LOCKED bit was supposed to prevent that, then
> how could you hit the original race? There should be no concurrent
> callers to xs_reset_transport on the same xprt, right?

Correct. The only exception is xs_destroy.

> AFAICT, that bit is not set in the xprt_destroy codepath, which may
> be
> the root cause of the problem. How would we take it there anyway?
> xprt_destroy is void return, and may not be called in the context of
> a
> rpc_task. If it's contended,  what do we do? Sleep until it's
> cleared?
> 

How about the following.

8<-----------------------------------------------------------------
>From e2e68218e66c6b0715fd6b8f1b3092694a7c0e62 Mon Sep 17 00:00:00 2001
From: Trond Myklebust <trond.myklebust@xxxxxxxxxxxxxxx>
Date: Thu, 17 Sep 2015 10:42:27 -0400
Subject: [PATCH] SUNRPC: Fix races between socket connection and destroy code

When we're destroying the socket transport, we need to ensure that
we cancel any existing delayed connection attempts, and order them
w.r.t. the call to xs_close().

Signed-off-by: Trond Myklebust <trond.myklebust@xxxxxxxxxxxxxxx>
---
 net/sunrpc/xprtsock.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c
index 7be90bc1a7c2..d2dfbd043bea 100644
--- a/net/sunrpc/xprtsock.c
+++ b/net/sunrpc/xprtsock.c
@@ -881,8 +881,11 @@ static void xs_xprt_free(struct rpc_xprt *xprt)
  */
 static void xs_destroy(struct rpc_xprt *xprt)
 {
+	struct sock_xprt *transport = container_of(xprt,
+			struct sock_xprt, xprt);
 	dprintk("RPC:       xs_destroy xprt %p\n", xprt);
 
+	cancel_delayed_work_sync(&transport->connect_worker);
 	xs_close(xprt);
 	xs_xprt_free(xprt);
 	module_put(THIS_MODULE);
-- 
2.4.3


-- 
Trond Myklebust
Linux NFS client maintainer, PrimaryData
trond.myklebust@xxxxxxxxxxxxxxx



--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html