Re: [3.2.10] NFSv3 CLOSE_WAIT hang

"Myklebust, Trond" <Trond.Myklebust@xxxxxxxxxx> · Wed, 25 Apr 2012 14:16:47 +0000

On Tue, 2012-04-24 at 17:56 -0700, Simon Kirby wrote:
> Hello!
> 
> I've reported this a few times now, but we still keep hitting the issue.
> NFS clients in the cluster occasionally get stuck with the socket to one
> of the NFS servers in CLOSE_WAIT state, and don't recover.
> 
> The NFS servers (knfsd) used to be running 2.6.30 since newer kernels hit
> XFS bugs, but are all now finally running 3.2.10 or 3.2.16.
> 
> I've added some debugging printk()s and WARN()s to the kernel to try to
> figure out what is going on. Without knowing much about how it's actually
> expected to work :), these were mostly to figure out why the socket is
> being closed. I added the following on top of with Trond's sunrpc
> debugging patch:
<snip>
> [1119196.640017] nfs: RPC call timeout (110), disconnecting
> [1119196.720026] nfs: RPC call timeout (110), disconnecting
> [1119196.892021] nfs: RPC call timeout (110), disconnecting
> [1119198.516022] nfs: RPC call timeout (110), disconnecting
> [1119220.576025] nfs: RPC call timeout (110), disconnecting
> [1119871.048040] nfs: RPC call timeout (110), disconnecting
> [1120266.464021] nfs: RPC call timeout (110), disconnecting
> [1120341.472021] nfs: RPC call timeout (110), disconnecting
> [1122139.008022] nfs: RPC call timeout (110), disconnecting
> [1122166.112043] nfs: RPC call timeout (110), disconnecting
> [1122302.304037] nfs: RPC call timeout (110), disconnecting
> [1122666.735765] ------------[ cut here ]------------
> [1122666.735855] WARNING: at net/sunrpc/xprt.c:640 xprt_force_disconnect+0x4a/0xa0()
> [1122666.735945] Hardware name: PowerEdge 1950
> [1122666.736025] xprt_force_disconnect(): setting XPRT_CLOSE_WAIT
> [1122666.736099] Modules linked in: ipmi_devintf ipmi_si ipmi_msghandler xt_recent nf_conntrack_ftp xt_state xt_owner nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 bnx2 serio_raw evdev
> [1122666.736898] Pid: 18458, comm: httpd Tainted: G        W    3.2.10-hw+ #16
> [1122666.736975] Call Trace:
> [1122666.737043]  <IRQ>  [<ffffffff816b545a>] ? xprt_force_disconnect+0x4a/0xa0
> [1122666.737176]  [<ffffffff8105dd60>] warn_slowpath_common+0x80/0xc0
> [1122666.737252]  [<ffffffff8105de99>] warn_slowpath_fmt+0x69/0x70
> [1122666.737335]  [<ffffffffa0055aff>] ? tcp_packet+0x7cf/0x10f0 [nf_conntrack]
> [1122666.737414]  [<ffffffff81702c41>] ? _raw_spin_lock_bh+0x11/0x40
> [1122666.737490]  [<ffffffff816b545a>] xprt_force_disconnect+0x4a/0xa0
> [1122666.737566]  [<ffffffff816b87e8>] xs_tcp_state_change+0x78/0x320
> [1122666.737643]  [<ffffffff81667f4b>] tcp_fin+0x17b/0x210
> [1122666.737717]  [<ffffffff81668c52>] tcp_data_queue+0x4c2/0xe70
> [1122666.737793]  [<ffffffff8166c389>] ? tcp_validate_incoming+0x109/0x340
> [1122666.737869]  [<ffffffff8166c709>] tcp_rcv_established+0x149/0xb00
> [1122666.737946]  [<ffffffff816751dc>] tcp_v4_do_rcv+0x15c/0x2e0
> [1122666.738021]  [<ffffffff81649924>] ? nf_iterate+0x84/0xb0
> [1122666.738096]  [<ffffffff816759af>] tcp_v4_rcv+0x64f/0x8b0
> [1122666.738170]  [<ffffffff81649a9d>] ? nf_hook_slow+0x6d/0x130
> [1122666.738246]  [<ffffffff816543b0>] ? ip_rcv+0x2f0/0x2f0
> [1122666.738322]  [<ffffffff8165445a>] ip_local_deliver_finish+0xaa/0x1d0
> [1122666.738399]  [<ffffffff8165460d>] ip_local_deliver+0x8d/0xa0
> [1122666.738474]  [<ffffffff81653ef9>] ip_rcv_finish+0x169/0x330
> [1122666.738549]  [<ffffffff8165433d>] ip_rcv+0x27d/0x2f0
> [1122666.738625]  [<ffffffff816256e7>] __netif_receive_skb+0x437/0x490
> [1122666.738701]  [<ffffffff81627568>] netif_receive_skb+0x78/0x80
> [1122666.738778]  [<ffffffff813aa8dc>] ? is_swiotlb_buffer+0x3c/0x50
> [1122666.738854]  [<ffffffff816276a0>] napi_skb_finish+0x50/0x70
> [1122666.738929]  [<ffffffff81627ce5>] napi_gro_receive+0x105/0x140
> [1122666.739008]  [<ffffffffa0029e7b>] bnx2_poll_work+0x60b/0x1550 [bnx2]
> [1122666.739085]  [<ffffffff8105096d>] ? enqueue_task_fair+0x15d/0x540
> [1122666.739163]  [<ffffffff8104825e>] ? cpu_smt_mask+0x1e/0x20
> [1122666.739238]  [<ffffffff810475dd>] ? check_preempt_curr+0x7d/0xa0
> [1122666.739315]  [<ffffffffa002aeec>] bnx2_poll+0x5c/0x230 [bnx2]
> [1122666.739391]  [<ffffffff81627f0b>] net_rx_action+0x11b/0x2e0
> [1122666.739467]  [<ffffffff81019276>] ? read_tsc+0x16/0x40
> [1122666.739542]  [<ffffffff81064fa0>] __do_softirq+0xe0/0x1d0
> [1122666.739619]  [<ffffffff8170c8ec>] call_softirq+0x1c/0x30
> [1122666.739693]  [<ffffffff810142b5>] do_softirq+0x65/0xa0
> [1122666.739767]  [<ffffffff81064cad>] irq_exit+0xad/0xe0
> [1122666.739841]  [<ffffffff81013a94>] do_IRQ+0x64/0xe0
> [1122666.739914]  [<ffffffff81702dee>] common_interrupt+0x6e/0x6e
> [1122666.739988]  <EOI>  [<ffffffff8170a712>] ? system_call_fastpath+0x16/0x1b
> [1122666.740127] ---[ end trace 18327738ff1711ee ]---
> [1122666.740203] RPC: server initated shutdown -- state 8 conn 1 dead 0 zapped 1 sk_shutdown 1
> 
> No other messages after this -- we see the socket in CLOSE_WAIT:
> 
> Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name Timer
> tcp        0      0 10.10.52.11:801         10.10.52.225:2049       CLOSE_WAIT  -                off (0.00/0/0)
> 
> Here's the dmesg after "rpcdebug -m rpc -c all": http://0x.ca/sim/ref/3.2.10/dmesg
> 
> Where should I go from here? Is it valid that some processes should get
> stuck in uninterruptible sleep in this case? Is it valid that the server
> should hang up at all?

Can you try applying something like the following patch on top of what
you have applied above? It is clear that something is going wrong in the
autodisconnect logic, but it is not yet 100% clear to me what that might
be. The patch below checks for races in xprt_autoclose() and also checks
to see if the xprt_force_disconnect might be failing to wake up a stuck
task that holds the socket write lock.

Cheers
  Trond
---

diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c
index c64c0ef..1904467 100644
--- a/net/sunrpc/xprt.c
+++ b/net/sunrpc/xprt.c
@@ -604,8 +604,9 @@ static void xprt_autoclose(struct work_struct *work)
 	struct rpc_xprt *xprt =
 		container_of(work, struct rpc_xprt, task_cleanup);
 
-	xprt->ops->close(xprt);
 	clear_bit(XPRT_CLOSE_WAIT, &xprt->state);
+	xprt->ops->close(xprt);
+	WARN_ON(test_bit(XPRT_CLOSE_WAIT, &xprt->state));
 	xprt_release_write(xprt, NULL);
 }
 
@@ -638,6 +639,14 @@ void xprt_force_disconnect(struct rpc_xprt *xprt)
 	if (test_and_set_bit(XPRT_LOCKED, &xprt->state) == 0)
 		queue_work(rpciod_workqueue, &xprt->task_cleanup);
 	xprt_wake_pending_tasks(xprt, -EAGAIN);
+	if (xprt->snd_task != NULL) {
+		struct rpc_task *task = xprt->snd_task;
+		if (RPC_IS_QUEUED(task)) {
+			WARN(1, "task %5u is queued on %s\n",
+					task->tk_pid,
+					rpc_qname(task->tk_waitqueue));
+		}
+	}
 	spin_unlock_bh(&xprt->transport_lock);
 }
 

-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
Trond.Myklebust@xxxxxxxxxx
www.netapp.com

��.n��������+%������w��{.n�����{��w���jg��������ݢj����G�������j:+v���w�m������w�������h�����٥