Following some debugging, I believe that the attached patch fixes the problem. Simply returning EAGAIN is not sufficient, as the task does not get requeued, and times out 13 seconds later (as per our mount options). Setting the SOCK_ASYNC_NOSPACE bit causes the requeue to happen. I realize that this is a gross hack and I should probably not be using SOCK_ASYNC_NOSPACE in that way. Is there a better way to achieve the same solution? -- Andrew Cooper - Dom0 Kernel Engineer, Citrix XenServer T: +44 (0)1223 225 900, http://www.citrix.com
diff -r 69bd2176baf9 net/sunrpc/xprtsock.c --- a/net/sunrpc/xprtsock.c Mon Nov 07 13:00:06 2011 +0000 +++ b/net/sunrpc/xprtsock.c Mon Nov 21 18:00:14 2011 +0000 @@ -503,17 +503,16 @@ static int xs_nospace(struct rpc_task *t /* Don't race with disconnect */ if (xprt_connected(xprt)) { - if (test_bit(SOCK_ASYNC_NOSPACE, &transport->sock->flags)) { - ret = -EAGAIN; - /* - * Notify TCP that we're limited by the application - * window size - */ - set_bit(SOCK_NOSPACE, &transport->sock->flags); - transport->inet->sk_write_pending++; - /* ...and wait for more buffer space */ - xprt_wait_for_buffer_space(task, xs_nospace_callback); - } + set_bit(SOCK_ASYNC_NOSPACE, &transport->sock->flags); + ret = -EAGAIN; + /* + * Notify TCP that we're limited by the application + * window size + */ + set_bit(SOCK_NOSPACE, &transport->sock->flags); + transport->inet->sk_write_pending++; + /* ...and wait for more buffer space */ + xprt_wait_for_buffer_space(task, xs_nospace_callback); } else { clear_bit(SOCK_ASYNC_NOSPACE, &transport->sock->flags); ret = -ENOTCONN;