On Thu, 21 Oct 2010 08:29:38 +1100 Neil Brown <neilb@xxxxxxx> wrote: > On Wed, 20 Oct 2010 10:29:05 -0400 > Chuck Lever <chuck.lever@xxxxxxxxxx> wrote: > > > > > On Oct 20, 2010, at 3:17 AM, Neil Brown wrote: > > > > > > > > > > > If I don't have any network configured (except loop-back), and try an NFSv3 > > > mount, then it fails quickly: > > > > > > > > > .... > > > mount.nfs: portmap query failed: RPC: Remote system error - Network is unreachable > > > mount.nfs: Network is unreachable > > > > > > > > > If I try the same thing with a NFSv4 mount, it times out before it fails, > > > making a much longer delay. > > > > > > This is because mount.nfs doesn't do a portmap lookup but just leaves > > > everything to the kernel. > > > The kernel does an 'rpc_ping()' which sets RPC_TASK_SOFTCONN. > > > So at least it doesn't retry after the timeout. But given that we have a > > > clear error, we shouldn't timeout at all. > > > > > > Unfortunately I cannot see an easy way to fix this. > > > > > > The place where ENETUNREACH is in xs_tcp_setup_socket. The comment there > > > says "Retry with the same socket after a delay". The "delay" bit is correct, > > > the "retry" isn't. > > > > > > It would seem that we should just add a 'goto out' there if RPC_TASK_SOFTCONN > > > was set. However we cannot see the task at this point - in fact it seems > > > that there could be a queue of tasks waiting on this connection. I guess > > > some could be soft, and some not. ??? > > > > > > So: An suggestions how to get a ENETUNREACH (or ECONNREFUSED or similar) to > > > fail immediately when RPC_TASK_SOFTCONN is set ??? > > > > ECONNREFUSED should already fail immediately in this case. If it's not failing immediately, that's a bug. > > > > I agree that ENETUNREACH seems appropriate for quick failure if RPC_TASK_SOFTCONN is set. (I thought it already worked this way, but maybe I'm mistaken). > > There is certainly code that seems to treat ENETUNREACH differently if > RPC_TASK_SOFTCONN is set, but it doesn't seem to apply in the particular case > I am testing. > e.g. call_bind_status handles ENETUNREACH as a retry if not SOFTCONN and as a > failure in the SOFTCONN case. > I guess NFSv4 doesn't hit this because the port is explicitly set to 2049 so > it never does the rpcbind step. > So maybe we need to handle ENETUNREACH in call_connect_status as well as > call_bind_status ?? > > Maybe something like that ... The placement of rpc_delay seems a little of > to me, but follows call_bind_status, so it could be correct. > I did a bit of testing of the patch that I sent and it isn't quite write - the ENETUNREACH doesn't propagate all the way up to call_connect_status. This patch fixes that. With it, the rpc_ping fails nicely, but when a reconnect is tried on an already-mounted filesystem it doesn't fail but rather retries every 5 seconds. This is what I wanted to happen. However I'm not at all sure that "5 seconds" is correct. I copied it from call_bind_status, but it seems a bit short. Maybe the number in call_bind_status is a bit low??? Here is my current patch - which is more a starting point for discussion than a concrete proposal. Thanks, NeilBrown diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c index fa55490..539885e 100644 --- a/net/sunrpc/clnt.c +++ b/net/sunrpc/clnt.c @@ -1245,6 +1245,12 @@ call_connect_status(struct rpc_task *task) } switch (status) { + case -ENETUNREACH: + case -ECONNRESET: + case -ECONNREFUSED: + if (!RPC_IS_SOFTCONN(task)) + rpc_delay(task, 5*HZ); + /* fall through */ /* if soft mounted, test if we've timed out */ case -ETIMEDOUT: task->tk_action = call_timeout; diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c index 970fb00..27673d9 100644 --- a/net/sunrpc/xprt.c +++ b/net/sunrpc/xprt.c @@ -733,6 +733,10 @@ static void xprt_connect_status(struct rpc_task *task) } switch (task->tk_status) { + case -ENETUNREACH: + case -ECONNREFUSED: + case -ECONNRESET: + break; case -EAGAIN: dprintk("RPC: %5u xprt_connect_status: retrying\n", task->tk_pid); break; diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c index fe9306b..0743994 100644 --- a/net/sunrpc/xprtsock.c +++ b/net/sunrpc/xprtsock.c @@ -1906,7 +1906,8 @@ static void xs_tcp_setup_socket(struct rpc_xprt *xprt, case -ECONNREFUSED: case -ECONNRESET: case -ENETUNREACH: - /* retry with existing socket, after a delay */ + /* allow upper layers to choose between failure and retry */ + goto out; case 0: case -EINPROGRESS: case -EALREADY: -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html