Re: [PATCH] NFSD/sunrpc: avoid deadlock on TCP connection due to memory pressure.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, 25 Jul 2013 16:18:05 -0400 "J.Bruce Fields" <bfields@xxxxxxxxxxxxxx>
wrote:

> On Thu, Jul 25, 2013 at 11:30:23AM +1000, NeilBrown wrote:
> > 
> > Since we enabled auto-tuning for sunrpc TCP connections we do not
> > guarantee that there is enough write-space on each connection to
> > queue a reply.
> > 
> > If memory pressure causes the window to shrink too small, the request
> > throttling in sunrpc/svc will not accept any requests so no more requests
> > will be handled.  Even when pressure decreases the window will not
> > grow again until data is sent on the connection.
> > This means we get a deadlock:  no requests will be handled until there
> > is more space, and no space will be allocated until a request is
> > handled.
> > 
> > This can be simulated by modifying svc_tcp_has_wspace to inflate the
> > number of byte required and removing the 'svc_sock_setbufsize' calls
> > in svc_setup_socket.
> 
> Ah-hah!
> 
> > I found that multiplying by 16 was enough to make the requirement
> > exceed the default allocation.  With this modification in place:
> >    mount -o vers=3,proto=tcp 127.0.0.1:/home /mnt
> > would block and eventually time out because the nfs server could not
> > accept any requests.
> 
> So, this?:

Close enough.  I just put "//" in front of the lines I didn't want rather
than delete them.  But yes: exactly that effect.


> 
> diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c
> index 305374d..36de50d 100644
> --- a/net/sunrpc/svcsock.c
> +++ b/net/sunrpc/svcsock.c
> @@ -1193,6 +1193,7 @@ static int svc_tcp_has_wspace(struct svc_xprt *xprt)
>  	if (test_bit(XPT_LISTENER, &xprt->xpt_flags))
>  		return 1;
>  	required = atomic_read(&xprt->xpt_reserved) + serv->sv_max_mesg;
> +	required *= 16;
>  	if (sk_stream_wspace(svsk->sk_sk) >= required)
>  		return 1;
>  	set_bit(SOCK_NOSPACE, &svsk->sk_sock->flags);
> @@ -1378,14 +1379,8 @@ static struct svc_sock *svc_setup_socket(struct svc_serv *serv,
>  	/* Initialize the socket */
>  	if (sock->type == SOCK_DGRAM)
>  		svc_udp_init(svsk, serv);
> -	else {
> -		/* initialise setting must have enough space to
> -		 * receive and respond to one request.
> -		 */
> -		svc_sock_setbufsize(svsk->sk_sock, 4 * serv->sv_max_mesg,
> -					4 * serv->sv_max_mesg);
> +	else
>  		svc_tcp_init(svsk, serv);
> -	}
>  
>  	dprintk("svc: svc_setup_socket created %p (inet %p)\n",
>  				svsk, svsk->sk_sk);
> 
> > This patch relaxes the request throttling to always allow at least one
> > request through per connection.  It does this by checking both
> >   sk_stream_min_wspace() and xprt->xpt_reserved
> > are zero.
> > The first is zero when the TCP transmit queue is empty.
> > The second is zero when there are no RPC requests being processed.
> > When both of these are zero the socket is idle and so one more
> > request can safely be allowed through.
> > 
> > Applying this patch allows the above mount command to succeed cleanly.
> > Tracing shows that the allocated write buffer space quickly grows and
> > after a few requests are handled, the extra tests are no longer needed
> > to permit further requests to be processed.
> > 
> > The main purpose of request throttling is to handle the case when one
> > client is slow at collecting replies and the send queue gets full of
> > replies that the client hasn't acknowledged (at the TCP level) yet.
> > As we only change behaviour when the send queue is empty this main
> > purpose is still preserved.
> > 
> > Reported-by: Ben Myers <bpm@xxxxxxx>
> > Signed-off-by: NeilBrown <neilb@xxxxxxx>
> > 
> > --
> > As you can see I've changed the patch.  While writing up the above 
> > description realised there was a weakness and so added the sk_stream_min_wspace
> > test.  That allowed me to write the final paragraph.
> 
> This is great, thanks!
> 
> Inclined to queue it up for 3.11 and stable....

I'd agree for 3.11.
It feels a bit border-line for stable.  "dead-lock" and "has been seen in the
wild" are technically enough justification...
I'd probably mark it as "pleas don't apply to -stable until 3.11 is released"
or something like that, just for a bit of breathing space.
Your call though.

NeilBrown

> 
> --b.
> 
> > 
> > NeilBrown
> > 
> > 
> > diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c
> > index 305374d..7762b9f 100644
> > --- a/net/sunrpc/svcsock.c
> > +++ b/net/sunrpc/svcsock.c
> > @@ -1193,7 +1193,9 @@ static int svc_tcp_has_wspace(struct svc_xprt *xprt)
> >  	if (test_bit(XPT_LISTENER, &xprt->xpt_flags))
> >  		return 1;
> >  	required = atomic_read(&xprt->xpt_reserved) + serv->sv_max_mesg;
> > -	if (sk_stream_wspace(svsk->sk_sk) >= required)
> > +	if (sk_stream_wspace(svsk->sk_sk) >= required ||
> > +	    (sk_stream_min_wspace(svsk->sk_sk) == 0 &&
> > +	     atomic_read(&xprt->xpt_reserved) == 0))
> >  		return 1;
> >  	set_bit(SOCK_NOSPACE, &svsk->sk_sock->flags);
> >  	return 0;
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

Attachment: signature.asc
Description: PGP signature


[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux