On Tue, 2012-04-24 at 17:56 -0700, Simon Kirby wrote: > Hello! > > I've reported this a few times now, but we still keep hitting the issue. > NFS clients in the cluster occasionally get stuck with the socket to one > of the NFS servers in CLOSE_WAIT state, and don't recover. > > The NFS servers (knfsd) used to be running 2.6.30 since newer kernels hit > XFS bugs, but are all now finally running 3.2.10 or 3.2.16. > > I've added some debugging printk()s and WARN()s to the kernel to try to > figure out what is going on. Without knowing much about how it's actually > expected to work :), these were mostly to figure out why the socket is > being closed. I added the following on top of with Trond's sunrpc > debugging patch: <snip> > [1119196.640017] nfs: RPC call timeout (110), disconnecting > [1119196.720026] nfs: RPC call timeout (110), disconnecting > [1119196.892021] nfs: RPC call timeout (110), disconnecting > [1119198.516022] nfs: RPC call timeout (110), disconnecting > [1119220.576025] nfs: RPC call timeout (110), disconnecting > [1119871.048040] nfs: RPC call timeout (110), disconnecting > [1120266.464021] nfs: RPC call timeout (110), disconnecting > [1120341.472021] nfs: RPC call timeout (110), disconnecting > [1122139.008022] nfs: RPC call timeout (110), disconnecting > [1122166.112043] nfs: RPC call timeout (110), disconnecting > [1122302.304037] nfs: RPC call timeout (110), disconnecting > [1122666.735765] ------------[ cut here ]------------ > [1122666.735855] WARNING: at net/sunrpc/xprt.c:640 xprt_force_disconnect+0x4a/0xa0() > [1122666.735945] Hardware name: PowerEdge 1950 > [1122666.736025] xprt_force_disconnect(): setting XPRT_CLOSE_WAIT > [1122666.736099] Modules linked in: ipmi_devintf ipmi_si ipmi_msghandler xt_recent nf_conntrack_ftp xt_state xt_owner nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 bnx2 serio_raw evdev > [1122666.736898] Pid: 18458, comm: httpd Tainted: G W 3.2.10-hw+ #16 > [1122666.736975] Call Trace: > [1122666.737043] <IRQ> [<ffffffff816b545a>] ? xprt_force_disconnect+0x4a/0xa0 > [1122666.737176] [<ffffffff8105dd60>] warn_slowpath_common+0x80/0xc0 > [1122666.737252] [<ffffffff8105de99>] warn_slowpath_fmt+0x69/0x70 > [1122666.737335] [<ffffffffa0055aff>] ? tcp_packet+0x7cf/0x10f0 [nf_conntrack] > [1122666.737414] [<ffffffff81702c41>] ? _raw_spin_lock_bh+0x11/0x40 > [1122666.737490] [<ffffffff816b545a>] xprt_force_disconnect+0x4a/0xa0 > [1122666.737566] [<ffffffff816b87e8>] xs_tcp_state_change+0x78/0x320 > [1122666.737643] [<ffffffff81667f4b>] tcp_fin+0x17b/0x210 > [1122666.737717] [<ffffffff81668c52>] tcp_data_queue+0x4c2/0xe70 > [1122666.737793] [<ffffffff8166c389>] ? tcp_validate_incoming+0x109/0x340 > [1122666.737869] [<ffffffff8166c709>] tcp_rcv_established+0x149/0xb00 > [1122666.737946] [<ffffffff816751dc>] tcp_v4_do_rcv+0x15c/0x2e0 > [1122666.738021] [<ffffffff81649924>] ? nf_iterate+0x84/0xb0 > [1122666.738096] [<ffffffff816759af>] tcp_v4_rcv+0x64f/0x8b0 > [1122666.738170] [<ffffffff81649a9d>] ? nf_hook_slow+0x6d/0x130 > [1122666.738246] [<ffffffff816543b0>] ? ip_rcv+0x2f0/0x2f0 > [1122666.738322] [<ffffffff8165445a>] ip_local_deliver_finish+0xaa/0x1d0 > [1122666.738399] [<ffffffff8165460d>] ip_local_deliver+0x8d/0xa0 > [1122666.738474] [<ffffffff81653ef9>] ip_rcv_finish+0x169/0x330 > [1122666.738549] [<ffffffff8165433d>] ip_rcv+0x27d/0x2f0 > [1122666.738625] [<ffffffff816256e7>] __netif_receive_skb+0x437/0x490 > [1122666.738701] [<ffffffff81627568>] netif_receive_skb+0x78/0x80 > [1122666.738778] [<ffffffff813aa8dc>] ? is_swiotlb_buffer+0x3c/0x50 > [1122666.738854] [<ffffffff816276a0>] napi_skb_finish+0x50/0x70 > [1122666.738929] [<ffffffff81627ce5>] napi_gro_receive+0x105/0x140 > [1122666.739008] [<ffffffffa0029e7b>] bnx2_poll_work+0x60b/0x1550 [bnx2] > [1122666.739085] [<ffffffff8105096d>] ? enqueue_task_fair+0x15d/0x540 > [1122666.739163] [<ffffffff8104825e>] ? cpu_smt_mask+0x1e/0x20 > [1122666.739238] [<ffffffff810475dd>] ? check_preempt_curr+0x7d/0xa0 > [1122666.739315] [<ffffffffa002aeec>] bnx2_poll+0x5c/0x230 [bnx2] > [1122666.739391] [<ffffffff81627f0b>] net_rx_action+0x11b/0x2e0 > [1122666.739467] [<ffffffff81019276>] ? read_tsc+0x16/0x40 > [1122666.739542] [<ffffffff81064fa0>] __do_softirq+0xe0/0x1d0 > [1122666.739619] [<ffffffff8170c8ec>] call_softirq+0x1c/0x30 > [1122666.739693] [<ffffffff810142b5>] do_softirq+0x65/0xa0 > [1122666.739767] [<ffffffff81064cad>] irq_exit+0xad/0xe0 > [1122666.739841] [<ffffffff81013a94>] do_IRQ+0x64/0xe0 > [1122666.739914] [<ffffffff81702dee>] common_interrupt+0x6e/0x6e > [1122666.739988] <EOI> [<ffffffff8170a712>] ? system_call_fastpath+0x16/0x1b > [1122666.740127] ---[ end trace 18327738ff1711ee ]--- > [1122666.740203] RPC: server initated shutdown -- state 8 conn 1 dead 0 zapped 1 sk_shutdown 1 > > No other messages after this -- we see the socket in CLOSE_WAIT: > > Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name Timer > tcp 0 0 10.10.52.11:801 10.10.52.225:2049 CLOSE_WAIT - off (0.00/0/0) > > Here's the dmesg after "rpcdebug -m rpc -c all": http://0x.ca/sim/ref/3.2.10/dmesg > > Where should I go from here? Is it valid that some processes should get > stuck in uninterruptible sleep in this case? Is it valid that the server > should hang up at all? Can you try applying something like the following patch on top of what you have applied above? It is clear that something is going wrong in the autodisconnect logic, but it is not yet 100% clear to me what that might be. The patch below checks for races in xprt_autoclose() and also checks to see if the xprt_force_disconnect might be failing to wake up a stuck task that holds the socket write lock. Cheers Trond --- diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c index c64c0ef..1904467 100644 --- a/net/sunrpc/xprt.c +++ b/net/sunrpc/xprt.c @@ -604,8 +604,9 @@ static void xprt_autoclose(struct work_struct *work) struct rpc_xprt *xprt = container_of(work, struct rpc_xprt, task_cleanup); - xprt->ops->close(xprt); clear_bit(XPRT_CLOSE_WAIT, &xprt->state); + xprt->ops->close(xprt); + WARN_ON(test_bit(XPRT_CLOSE_WAIT, &xprt->state)); xprt_release_write(xprt, NULL); } @@ -638,6 +639,14 @@ void xprt_force_disconnect(struct rpc_xprt *xprt) if (test_and_set_bit(XPRT_LOCKED, &xprt->state) == 0) queue_work(rpciod_workqueue, &xprt->task_cleanup); xprt_wake_pending_tasks(xprt, -EAGAIN); + if (xprt->snd_task != NULL) { + struct rpc_task *task = xprt->snd_task; + if (RPC_IS_QUEUED(task)) { + WARN(1, "task %5u is queued on %s\n", + task->tk_pid, + rpc_qname(task->tk_waitqueue)); + } + } spin_unlock_bh(&xprt->transport_lock); } -- Trond Myklebust Linux NFS client maintainer NetApp Trond.Myklebust@xxxxxxxxxx www.netapp.com ��.n��������+%������w��{.n�����{��w���jg��������ݢj����G�������j:+v���w�m������w�������h�����٥