> >> I would expect the timeouts to have changed due to the NFSv4 trunking detection (which is > >> exactly why it is wrong to rely on the kernel timeouts here anyway), but I would not expect > >> the kernel to never time out at all. > > It appears it started with 3.13 kernels... The above stack is from a 3.14-ish client. > > > > Which patch caused the behaviour to change? 561ec1603171cd9b38dcf6cac53e8710f437a48d is the first bad commit commit 561ec1603171cd9b38dcf6cac53e8710f437a48d Author: Trond Myklebust <Trond.Myklebust@xxxxxxxxxx> Date: Thu Sep 26 15:22:45 2013 -0400 SUNRPC: call_connect_status should recheck bind and connect status on error Currently, we go directly to call_transmit which sends us to call_status on error. If we know that the connect attempt failed, we should rather just jump straight back to call_bind and call_connect. Ditto for EAGAIN, except do not delay. Signed-off-by: Trond Myklebust <Trond.Myklebust@xxxxxxxxxx> If I revert that commit from mainline (which may be a completely bogus thing to do) then mainline works (at least for this specific simple test). (The revert required some wiggling - I'll include it below). To be precise, the test is to try to mount mount server:/path /mnt from a server which has run rpc.nfsd -T -N4 "success" is getting periodic messages: mount.nfs: trying text-based options 'retry=1,vers=4,addr=10.0.10.2,clientaddr=10.0.10.1' mount.nfs: mount(2): Connection refused "failure" is not getting those messages. There is another change though. For the commit above I don't not get "Connection refused", but after 2 minutes I get mount.nfs: mount(2): Connection timed out With mainline, it waits forever. I did a second git bisect for this change and found 2118071d3b0d57a03fad77885f4fdc364798aa87 is the first bad commit commit 2118071d3b0d57a03fad77885f4fdc364798aa87 Author: Trond Myklebust <trond.myklebust@xxxxxxxxxxxxxxx> Date: Tue Dec 31 13:22:59 2013 -0500 SUNRPC: Report connection error values to rpc_tasks on the pending queue Currently we only report EAGAIN, which is not descriptive enough for softconn tasks. Signed-off-by: Trond Myklebust <trond.myklebust@xxxxxxxxxxxxxxx> From this commit, a mount attempt which is getting connections denied will block indefinitely. Hope that is helpful. NeilBrown This is the revert that I mentioned - just for completeness. From 9c1462ff54fcc2adc79c825b867c32c19e30a9a7 Mon Sep 17 00:00:00 2001 From: NeilBrown <neilb@xxxxxxx> Date: Thu, 13 Mar 2014 11:38:54 +1100 Subject: [PATCH] Revert "SUNRPC: call_connect_status should recheck bind and connect status on error" This reverts commit 561ec1603171cd9b38dcf6cac53e8710f437a48d. Conflicts: net/sunrpc/clnt.c diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c index 0edada973434..ba0cd114f0e1 100644 --- a/net/sunrpc/clnt.c +++ b/net/sunrpc/clnt.c @@ -1796,7 +1796,6 @@ call_connect_status(struct rpc_task *task) dprint_status(task); trace_rpc_connect_status(task, status); - task->tk_status = 0; switch (status) { /* if soft mounted, test if we've timed out */ case -ETIMEDOUT: @@ -1805,16 +1804,14 @@ call_connect_status(struct rpc_task *task) case -ECONNREFUSED: case -ECONNRESET: case -ECONNABORTED: - case -ENETUNREACH: case -EHOSTUNREACH: - /* retry with existing socket, after a delay */ - rpc_delay(task, 3*HZ); + case -ENETUNREACH: if (RPC_IS_SOFTCONN(task)) break; - case -EAGAIN: - task->tk_action = call_bind; - return; + /* retry with existing socket, after a delay */ case 0: + case -EAGAIN: + task->tk_status = 0; clnt->cl_stats->netreconn++; task->tk_action = call_transmit; return;
Attachment:
signature.asc
Description: PGP signature