[PATCH] SUNRPC: reset TCP reconnect exponential back-off on successful connection.

Neil Brown <neilb@xxxxxxx> · Fri, 17 Jul 2009 17:53:37 +1000

Hi.
 A customer of ours has been testing NFS failover and has been
 experiencing unexpected delays before the client starts writing
 again.   It turns out there are a number of issues here, some client
 and some server.

 This patch fixes two client issues, one that causes the failover time
 to double on each migration (or each time the NFS server is stopped
 and restarted), and one that causes the client to spam the server
 with SYN requests until it accepts the connection (I have a trace
 showing over 100 SYN requests, each followed by a RST,ACK reply, in
 the space for 300 milliseconds).

 I am able to simulate the first failure and have tested that the
 patch fixes it.  I have not managed to simulate the second failure,
 but I think that fix is clearly safe.

 I'm not sure that the patch fits the original definition for -stable,
 but it seems to fit the current practice and I would appreciate if
 (assuming the patch passes review) it could be submitted for -stable.

Thanks,
NeilBrown



The sunrpc/TCP transport has an exponential back-off for reconnection,
starting at 3 seconds and with a maximum of 300 seconds.  On every
connection attempt the timeout is doubled.
It is only reset when the client deliberately closes the connection.
If the server closes the connection but a subsequent reconnect
succeeds, the timeout remains elevated.

This means that if the server resets the connection several times, as
can happen with server migration in a clustered environment, each
reconnect takes longer than the previous one - unnecessarily so.

This patch resets the timeout on a successful connection so that every
time the server resets the connection we start with a basic 3 second
timeout.

There is also the possibility for the reverse problem.  When the
client closes the connection it sets the timeout to 0 (so that a
reconnect - when required - is instant).  When 0 is doubled it remains
at 0, so if the server refused the reconnect, the client will try
again instantly and indefinitely.  To avoid this we ensure that after
doubling the timeout it is at least the minimum.

Cc: stable@xxxxxxxxxxxxxxx
Signed-off-by: NeilBrown <neilb@xxxxxxx>
---
 net/sunrpc/xprtsock.c |    3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c
index 83c73c4..b032e06 100644
--- a/net/sunrpc/xprtsock.c
+++ b/net/sunrpc/xprtsock.c
@@ -1403,6 +1403,7 @@ static void xs_tcp_state_change(struct sock *sk)
 				TCP_RCV_COPY_FRAGHDR | TCP_RCV_COPY_XID;
 
 			xprt_wake_pending_tasks(xprt, -EAGAIN);
+			xprt->reestablish_timeout = 0;
 		}
 		spin_unlock_bh(&xprt->transport_lock);
 		break;
@@ -2090,6 +2091,8 @@ static void xs_connect(struct rpc_task *task)
 				   &transport->connect_worker,
 				   xprt->reestablish_timeout);
 		xprt->reestablish_timeout <<= 1;
+		if (xprt->reestablish_timeout < XS_TCP_INIT_REEST_TO)
+			xprt->reestablish_timeout = XS_TCP_INIT_REEST_TO;
 		if (xprt->reestablish_timeout > XS_TCP_MAX_REEST_TO)
 			xprt->reestablish_timeout = XS_TCP_MAX_REEST_TO;
 	} else {
-- 
1.6.3.3

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html