On Mon, 14 Apr 2014 12:57:58 -0400 Trond Myklebust <trond.myklebust@xxxxxxxxxxxxxxx> wrote: > > On Apr 14, 2014, at 12:25, Jeff Layton <jlayton@xxxxxxxxxx> wrote: > > > On Mon, 17 Mar 2014 14:40:44 -0400 > > Trond Myklebust <trond.myklebust@xxxxxxxxxxxxxxx> wrote: > > > >> When the server is unavailable due to a networking error, etc, we want > >> the RPC client to respect the timeout delays when attempting to reconnect. > >> > >> Fixes: 561ec1603171 (SUNRPC: call_connect_status should recheck bind..) > >> Signed-off-by: Trond Myklebust <trond.myklebust@xxxxxxxxxxxxxxx> > >> --- > >> net/sunrpc/clnt.c | 8 +++----- > >> 1 file changed, 3 insertions(+), 5 deletions(-) > >> > >> diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c > >> index 0edada973434..f22d3a115fda 100644 > >> --- a/net/sunrpc/clnt.c > >> +++ b/net/sunrpc/clnt.c > >> @@ -1798,10 +1798,6 @@ call_connect_status(struct rpc_task *task) > >> trace_rpc_connect_status(task, status); > >> task->tk_status = 0; > >> switch (status) { > >> - /* if soft mounted, test if we've timed out */ > >> - case -ETIMEDOUT: > >> - task->tk_action = call_timeout; > >> - return; > >> case -ECONNREFUSED: > >> case -ECONNRESET: > >> case -ECONNABORTED: > >> @@ -1812,7 +1808,9 @@ call_connect_status(struct rpc_task *task) > >> if (RPC_IS_SOFTCONN(task)) > >> break; > >> case -EAGAIN: > >> - task->tk_action = call_bind; > >> + case -ETIMEDOUT: > >> + /* Check if we've timed out before looping back to call_bind */ > >> + task->tk_action = call_timeout; > >> return; > >> case 0: > >> clnt->cl_stats->netreconn++; > > > > I believe this patch may have broken the v4.0 callback channel > > establishment code in nfsd. I think what's happening is this: > > > > nfsd tries to create a RPC_TASK_SOFTCONN call to probe the cb channel > > with a CB_NULL. It queues the connect_worker to the workqueue. That > > establishes the socket and then gets a callback from the socket layer > > into xs_tcp_state_change for TCP_ESTABLISHED. > > > > That code does: > > > > xprt_wake_pending_tasks(xprt, -EAGAIN); > > > > ...that wakes the task up, and sets the tk_status to -EAGAIN, and it > > then moves on to call_timeout due to this patch. That code then does > > this: > > > > if (RPC_IS_SOFTCONN(task)) { > > rpc_exit(task, -ETIMEDOUT); > > return; > > } > > > > ...and the callback ping then fails with an error. Reverting this patch > > seems to fix it. I see several ways that we could fix this, but I'm not > > clear on the right way. Maybe we shouldn't be waking up the tasks with > > -EAGAIN in the TCP_ESTABLISHED case? > > ...or, possibly setup_callback_client should be setting the timeparms.to_maxval to a non-zero value so that xprt_adjust_timeout() and xprt_reset_majortimeo() behave as expected. > > _________________________________ > Trond Myklebust > Linux NFS client maintainer, PrimaryData > trond.myklebust@xxxxxxxxxxxxxxx > Well spotted. That does indeed fix it. I'll spin up a patch and send it to Bruce. Thanks! -- Jeff Layton <jlayton@xxxxxxxxxx> -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html