On Thu, 25 Jul 2013 16:18:05 -0400 "J.Bruce Fields" <bfields@xxxxxxxxxxxxxx> wrote: > On Thu, Jul 25, 2013 at 11:30:23AM +1000, NeilBrown wrote: > > > > Since we enabled auto-tuning for sunrpc TCP connections we do not > > guarantee that there is enough write-space on each connection to > > queue a reply. > > > > If memory pressure causes the window to shrink too small, the request > > throttling in sunrpc/svc will not accept any requests so no more requests > > will be handled. Even when pressure decreases the window will not > > grow again until data is sent on the connection. > > This means we get a deadlock: no requests will be handled until there > > is more space, and no space will be allocated until a request is > > handled. > > > > This can be simulated by modifying svc_tcp_has_wspace to inflate the > > number of byte required and removing the 'svc_sock_setbufsize' calls > > in svc_setup_socket. > > Ah-hah! > > > I found that multiplying by 16 was enough to make the requirement > > exceed the default allocation. With this modification in place: > > mount -o vers=3,proto=tcp 127.0.0.1:/home /mnt > > would block and eventually time out because the nfs server could not > > accept any requests. > > So, this?: Close enough. I just put "//" in front of the lines I didn't want rather than delete them. But yes: exactly that effect. > > diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c > index 305374d..36de50d 100644 > --- a/net/sunrpc/svcsock.c > +++ b/net/sunrpc/svcsock.c > @@ -1193,6 +1193,7 @@ static int svc_tcp_has_wspace(struct svc_xprt *xprt) > if (test_bit(XPT_LISTENER, &xprt->xpt_flags)) > return 1; > required = atomic_read(&xprt->xpt_reserved) + serv->sv_max_mesg; > + required *= 16; > if (sk_stream_wspace(svsk->sk_sk) >= required) > return 1; > set_bit(SOCK_NOSPACE, &svsk->sk_sock->flags); > @@ -1378,14 +1379,8 @@ static struct svc_sock *svc_setup_socket(struct svc_serv *serv, > /* Initialize the socket */ > if (sock->type == SOCK_DGRAM) > svc_udp_init(svsk, serv); > - else { > - /* initialise setting must have enough space to > - * receive and respond to one request. > - */ > - svc_sock_setbufsize(svsk->sk_sock, 4 * serv->sv_max_mesg, > - 4 * serv->sv_max_mesg); > + else > svc_tcp_init(svsk, serv); > - } > > dprintk("svc: svc_setup_socket created %p (inet %p)\n", > svsk, svsk->sk_sk); > > > This patch relaxes the request throttling to always allow at least one > > request through per connection. It does this by checking both > > sk_stream_min_wspace() and xprt->xpt_reserved > > are zero. > > The first is zero when the TCP transmit queue is empty. > > The second is zero when there are no RPC requests being processed. > > When both of these are zero the socket is idle and so one more > > request can safely be allowed through. > > > > Applying this patch allows the above mount command to succeed cleanly. > > Tracing shows that the allocated write buffer space quickly grows and > > after a few requests are handled, the extra tests are no longer needed > > to permit further requests to be processed. > > > > The main purpose of request throttling is to handle the case when one > > client is slow at collecting replies and the send queue gets full of > > replies that the client hasn't acknowledged (at the TCP level) yet. > > As we only change behaviour when the send queue is empty this main > > purpose is still preserved. > > > > Reported-by: Ben Myers <bpm@xxxxxxx> > > Signed-off-by: NeilBrown <neilb@xxxxxxx> > > > > -- > > As you can see I've changed the patch. While writing up the above > > description realised there was a weakness and so added the sk_stream_min_wspace > > test. That allowed me to write the final paragraph. > > This is great, thanks! > > Inclined to queue it up for 3.11 and stable.... I'd agree for 3.11. It feels a bit border-line for stable. "dead-lock" and "has been seen in the wild" are technically enough justification... I'd probably mark it as "pleas don't apply to -stable until 3.11 is released" or something like that, just for a bit of breathing space. Your call though. NeilBrown > > --b. > > > > > NeilBrown > > > > > > diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c > > index 305374d..7762b9f 100644 > > --- a/net/sunrpc/svcsock.c > > +++ b/net/sunrpc/svcsock.c > > @@ -1193,7 +1193,9 @@ static int svc_tcp_has_wspace(struct svc_xprt *xprt) > > if (test_bit(XPT_LISTENER, &xprt->xpt_flags)) > > return 1; > > required = atomic_read(&xprt->xpt_reserved) + serv->sv_max_mesg; > > - if (sk_stream_wspace(svsk->sk_sk) >= required) > > + if (sk_stream_wspace(svsk->sk_sk) >= required || > > + (sk_stream_min_wspace(svsk->sk_sk) == 0 && > > + atomic_read(&xprt->xpt_reserved) == 0)) > > return 1; > > set_bit(SOCK_NOSPACE, &svsk->sk_sock->flags); > > return 0; > > > -- > To unsubscribe from this list: send the line "unsubscribe linux-nfs" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html
Attachment:
signature.asc
Description: PGP signature