Re: [PATCH] SUNRPC: Fix a race in xs_reset_transport

Jeff Layton <jlayton@xxxxxxxxxxxxxxx> · Thu, 17 Sep 2015 10:18:47 -0400

On Thu, 17 Sep 2015 09:38:33 -0400
Trond Myklebust <trond.myklebust@xxxxxxxxxxxxxxx> wrote:

> On Tue, Sep 15, 2015 at 2:52 PM, Jeff Layton <jlayton@xxxxxxxxxxxxxxx> wrote:
> > On Tue, 15 Sep 2015 16:49:23 +0100
> > "Suzuki K. Poulose" <suzuki.poulose@xxxxxxx> wrote:
> >
> >>  net/sunrpc/xprtsock.c |    9 ++++++++-
> >>  1 file changed, 8 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c
> >> index 7be90bc..6f4789d 100644
> >> --- a/net/sunrpc/xprtsock.c
> >> +++ b/net/sunrpc/xprtsock.c
> >> @@ -822,9 +822,16 @@ static void xs_reset_transport(struct sock_xprt *transport)
> >>       if (atomic_read(&transport->xprt.swapper))
> >>               sk_clear_memalloc(sk);
> >>
> >> -     kernel_sock_shutdown(sock, SHUT_RDWR);
> >> +     if (sock)
> >> +             kernel_sock_shutdown(sock, SHUT_RDWR);
> >>
> >
> > Good catch, but...isn't this still racy? What prevents transport->sock
> > being set to NULL after you assign it to "sock" but before calling
> > kernel_sock_shutdown?
> 
> The XPRT_LOCKED state.
> 

IDGI -- if the XPRT_LOCKED bit was supposed to prevent that, then
how could you hit the original race? There should be no concurrent
callers to xs_reset_transport on the same xprt, right?

AFAICT, that bit is not set in the xprt_destroy codepath, which may be
the root cause of the problem. How would we take it there anyway?
xprt_destroy is void return, and may not be called in the context of a
rpc_task. If it's contended,  what do we do? Sleep until it's cleared?

-- 
Jeff Layton <jlayton@xxxxxxxxxxxxxxx>
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html