Re: [PATCH] SUNRPC: Fail over more quickly on connect errors

Chuck Lever <chuck.lever@xxxxxxxxxx> · Mon, 19 Apr 2010 19:35:40 -0400

On 04/16/2010 06:29 PM, Trond Myklebust wrote:
On Fri, 2010-04-16 at 16:47 -0400, Trond Myklebust wrote:
We should not allow soft tasks to wait for longer than the major timeout
period when waiting for a reconnect to occur.

Signed-off-by: Trond Myklebust<Trond.Myklebust@xxxxxxxxxx>
---
  net/sunrpc/xprt.c |    2 +-
  1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c
index c71d835..01449a3 100644
--- a/net/sunrpc/xprt.c
+++ b/net/sunrpc/xprt.c
@@ -710,7 +710,7 @@ void xprt_connect(struct rpc_task *task)
  		if (task->tk_rqstp)
  			task->tk_rqstp->rq_bytes_sent = 0;

-		task->tk_timeout = xprt->connect_timeout;
+		task->tk_timeout = min(req->rq_timeout, xprt->connect_timeout);
                                          ^^^ task->tk_rqstp->rq_timeout

Apologies. I though I had tested that...

		rpc_sleep_on(&xprt->pending, task, xprt_connect_status);

  		if (test_bit(XPRT_CLOSING,&xprt->state))

I tested this series of patches with soft mounts, and RPC requests now 
fail, after the timeout period, if the client can't reconnect.

I also observed appropriate exponential back-off behavior as the client 
attempts to reconnect.  I would suggest one more patch to reduce the 
reestablish timeout maximum to 30 seconds.

For the series:

Reviewed-by: Chuck Lever <chuck.lever@xxxxxxxxxx>

  and/or

Tested-by: Chuck Lever <chuck.lever@xxxxxxxxxx>

--
chuck[dot]lever[at]oracle[dot]com
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html