On Wed, Aug 27, 2014 at 1:35 PM, Andy Adamson <androsadamson@xxxxxxxxx> wrote: > We are seeing nfsiod hang for 5 to 20+ minutes. > > This thread hung for 5-10 minutes then cleared. > > Aug 26 05:10:01 scspr0012063007 kernel: nfsiod S > 0000000000000000 0 4931 2 0x00000080 > Aug 26 05:05:01 scspr0012063007 kernel: ffff880037891e30 > 0000000000000046 ffff8800d130d400 ffffffffa01e1030 > Aug 26 05:05:01 scspr0012063007 kernel: ffff880037891fd8 > ffffe8ffff608ac8 ffff880037891dc0 ffffffffa0287e8d > Aug 26 05:05:01 scspr0012063007 kernel: ffff880104fb5098 > ffff880037891fd8 000000000000fbc8 ffff880104fb5098 > Aug 26 05:05:01 scspr0012063007 kernel: Call Trace: > Aug 26 05:05:01 scspr0012063007 kernel: [<ffffffffa01e1030>] ? > rpc_async_release+0x0/0x20 [sunrpc] > Aug 26 05:05:01 scspr0012063007 kernel: [<ffffffffa0287e8d>] ? > nfs_writedata_release+0x6d/0x90 [nfs] > Aug 26 05:05:01 scspr0012063007 kernel: [<ffffffff8109b7fe>] ? > prepare_to_wait+0x4e/0x80 > Aug 26 05:05:01 scspr0012063007 kernel: [<ffffffffa01e1030>] ? > rpc_async_release+0x0/0x20 [sunrpc] > Aug 26 05:05:01 scspr0012063007 kernel: [<ffffffff81094fdc>] > worker_thread+0x1fc/0x2a0 > Aug 26 05:05:01 scspr0012063007 kernel: [<ffffffff8109b4d0>] ? > autoremove_wake_function+0x0/0x40 > Aug 26 05:05:01 scspr0012063007 kernel: [<ffffffff81094de0>] ? > worker_thread+0x0/0x2a0 > Aug 26 05:05:01 scspr0012063007 kernel: [<ffffffff8109b126>] kthread+0x96/0xa0 > Aug 26 05:05:01 scspr0012063007 kernel: [<ffffffff8100c20a>] child_rip+0xa/0x20 > Aug 26 05:05:01 scspr0012063007 kernel: [<ffffffff8109b090>] ? kthread+0x0/0xa0 > Aug 26 05:05:01 scspr0012063007 kernel: [<ffffffff8100c200>] ? > child_rip+0x0/0x20 > > This similar Call Trace, nfsiod hung for 20 minutest, then the client > was rebooted. > > Aug 26 06:00:01 scspr0012063007 kernel: nfsiod S > 0000000000000000 0 1701 2 0x00000000 > Aug 26 06:00:01 scspr0012063007 kernel: ffff880037a63e30 > 0000000000000046 ffff880037a62000 ffff880037a62000 > Aug 26 06:00:01 scspr0012063007 kernel: ffff8800f3421140 > 0000000000000000 ffff8800f3421140 ffffffffa0316030 > Aug 26 06:00:01 scspr0012063007 kernel: ffff880100e2f098 > ffff880037a63fd8 000000000000fbc8 ffff880100e2f098 > Aug 26 06:00:01 scspr0012063007 kernel: Call Trace: > Aug 26 06:00:01 scspr0012063007 kernel: [<ffffffffa0316030>] ? > rpc_async_release+0x0/0x20 [sunrpc] > Aug 26 06:00:01 scspr0012063007 kernel: [<ffffffff8109b7fe>] ? > prepare_to_wait+0x4e/0x80 > Aug 26 06:00:01 scspr0012063007 kernel: [<ffffffffa0316030>] ? > rpc_async_release+0x0/0x20 [sunrpc] > Aug 26 06:00:01 scspr0012063007 kernel: [<ffffffff81094fdc>] > worker_thread+0x1fc/0x2a0 > Aug 26 06:00:01 scspr0012063007 kernel: [<ffffffff8109b4d0>] ? > autoremove_wake_function+0x0/0x40 > Aug 26 06:00:01 scspr0012063007 kernel: [<ffffffff81094de0>] ? > worker_thread+0x0/0x2a0 > Aug 26 06:00:01 scspr0012063007 kernel: [<ffffffff8109b126>] kthread+0x96/0xa0 > Aug 26 06:00:01 scspr0012063007 kernel: [<ffffffff8100c20a>] child_rip+0xa/0x20 > Aug 26 06:00:01 scspr0012063007 kernel: [<ffffffff8109b090>] ? kthread+0x0/0xa0 > Aug 26 06:00:01 scspr0012063007 kernel: [<ffffffff8100c200>] ? > child_rip+0x0/0x20 > Doesn't the "?" beside the stack entries above label them as being unreliable (i.e. they lie outside the stack frame)? If so, it looks to me as if both these 2 threads are just sleeping in the worker_thread() function, which isn't unusual in itself. Are there any other hints that might help? -- Trond Myklebust Linux NFS client maintainer, PrimaryData trond.myklebust@xxxxxxxxxxxxxxx -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html