possible NFS related deadlock in hacked 2.6.38.7

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



This kernel is running some patches to make NFS support multiple
mounts bound to a local IP, though only a single mount
(and 20 readers, 20 writer threads) was used for this test case.

We are doing failover testing with an OpenFiler HA cluster.

We were also doing CIFS and iSCSI traffic concurrently with
the NFS, so it's possible those protocols are the root cause
instead...

We're trying to reproduce this with a kernel supporting lockdep
and other debugging logic, but I'm curious if anyone else
has seen a problem like this.  I believe we saw a similar
lockup on a 2.6.34 (or maybe .36 kernel), but it was several
weeks ago...this doesn't seem to be an easy problem to hit.




nfs: server 192.168.100.19 not responding, still trying
nfs: server 192.168.100.19 not responding, still trying
nfs: server 192.168.100.19 not responding, still trying
INFO: task btserver:20572 blocked for more than 180 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
btserver        D 0000000000000000     0 20572   2020 0x00000000
 ffff8802e5365ca8 0000000000000086 0000000000000001 0000000000000000
 ffff8802e573ac40 ffff8802e5365fd8 ffff8802e573af00 ffff8802e573aef8
 0000000000013280 ffff8802e5365fd8 0000000000013280 0000000000013280
Call Trace:
 [<ffffffff814122b0>] __mutex_lock_common+0x212/0x3bc
 [<ffffffff8141246e>] __mutex_lock_slowpath+0x14/0x16
 [<ffffffff81412549>] mutex_lock+0x27/0x3e
 [<ffffffff810f4f43>] do_last+0xb1/0x2bf
 [<ffffffff810f6d47>] do_filp_open+0x2c1/0x655
 [<ffffffff810da8a9>] ? __bit_spin_unlock.clone.1+0x1d/0x38
 [<ffffffff8103fde4>] ? get_parent_ip+0x11/0x42
 [<ffffffff814163ac>] ? sub_preempt_count+0x92/0xa5
 [<ffffffff81100a10>] ? alloc_fd+0x111/0x123
 [<ffffffff810e91c3>] do_sys_open+0x5b/0xed
 [<ffffffff8100bcaf>] ? math_state_restore+0x49/0x4b
 [<ffffffff810e927e>] sys_open+0x1b/0x1d
 [<ffffffff8100aad2>] system_call_fastpath+0x16/0x1b
INFO: task btserver:20583 blocked for more than 180 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
btserver        D 0000000000000002     0 20583   2020 0x00000000
 ffff8802e50dbca8 0000000000000082 0000000000000001 00000000ffffffff
 ffff880305bfb3a0 ffff8802e50dbfd8 ffff880305bfb660 ffff880305bfb658
 0000000000013280 ffff8802e50dbfd8 0000000000013280 0000000000013280
Call Trace:
 [<ffffffff814122b0>] __mutex_lock_common+0x212/0x3bc
 [<ffffffff8141246e>] __mutex_lock_slowpath+0x14/0x16
 [<ffffffff81412549>] mutex_lock+0x27/0x3e
 [<ffffffff810f4f43>] do_last+0xb1/0x2bf
 [<ffffffff810f6d47>] do_filp_open+0x2c1/0x655
 [<ffffffff8103fde4>] ? get_parent_ip+0x11/0x42
 [<ffffffff814163ac>] ? sub_preempt_count+0x92/0xa5
 [<ffffffff81100a10>] ? alloc_fd+0x111/0x123
 [<ffffffff810e91c3>] do_sys_open+0x5b/0xed
 [<ffffffff8100bcaf>] ? math_state_restore+0x49/0x4b
 [<ffffffff810e927e>] sys_open+0x1b/0x1d
 [<ffffffff8100aad2>] system_call_fastpath+0x16/0x1b
INFO: task btserver:20584 blocked for more than 180 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
btserver        D ffff8802e53ac740     0 20584   2020 0x00000000
 ffff8802e5407ca8 0000000000000086 0000000000000001 0000000000000001
 ffff880305bf8760 ffff8802e5407fd8 ffff880305bf8a20 ffff880305bf8a18
 0000000000013280 ffff8802e5407fd8 0000000000013280 0000000000013280
Call Trace:
 [<ffffffff8103fde4>] ? get_parent_ip+0x11/0x42
 [<ffffffff814122b0>] __mutex_lock_common+0x212/0x3bc
 [<ffffffff8141246e>] __mutex_lock_slowpath+0x14/0x16
 [<ffffffff81412549>] mutex_lock+0x27/0x3e
 [<ffffffff810f4f43>] do_last+0xb1/0x2bf
 [<ffffffff810f6d47>] do_filp_open+0x2c1/0x655
 [<ffffffff810da8a9>] ? __bit_spin_unlock.clone.1+0x1d/0x38
 [<ffffffff8103fde4>] ? get_parent_ip+0x11/0x42
 [<ffffffff814163ac>] ? sub_preempt_count+0x92/0xa5
 [<ffffffff81100a10>] ? alloc_fd+0x111/0x123
 [<ffffffff810e91c3>] do_sys_open+0x5b/0xed
 [<ffffffff8100bcaf>] ? math_state_restore+0x49/0x4b
 [<ffffffff810e927e>] sys_open+0x1b/0x1d
 [<ffffffff8100aad2>] system_call_fastpath+0x16/0x1b
INFO: task btserver:20587 blocked for more than 180 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
btserver        D 0000000000000004     0 20587   2020 0x00000000
 ffff8802e5373ca8 0000000000000086 0000000000000001 0000000000000000
 ffff88030474f600 ffff8802e5373fd8 ffff88030474f8c0 ffff88030474f8b8
 0000000000013280 ffff8802e5373fd8 0000000000013280 0000000000013280
Call Trace:
 [<ffffffff814122b0>] __mutex_lock_common+0x212/0x3bc
 [<ffffffff8141246e>] __mutex_lock_slowpath+0x14/0x16
 [<ffffffff81412549>] mutex_lock+0x27/0x3e
 [<ffffffff810f4f43>] do_last+0xb1/0x2bf
 [<ffffffff810f6d47>] do_filp_open+0x2c1/0x655
 [<ffffffff810da8a9>] ? __bit_spin_unlock.clone.1+0x1d/0x38
 [<ffffffff8103fde4>] ? get_parent_ip+0x11/0x42
 [<ffffffff814163ac>] ? sub_preempt_count+0x92/0xa5
 [<ffffffff81100a10>] ? alloc_fd+0x111/0x123
 [<ffffffff810e91c3>] do_sys_open+0x5b/0xed
 [<ffffffff8100bcaf>] ? math_state_restore+0x49/0x4b
 [<ffffffff810e927e>] sys_open+0x1b/0x1d
 [<ffffffff8100aad2>] system_call_fastpath+0x16/0x1b
INFO: task btserver:23670 blocked for more than 180 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
btserver        D 0000000000000007     0 23670   2020 0x00000000
 ffff8802e2351e38 0000000000000082 ffff8803046bd580 ffff880000000000
 ffff8802e211dfe0 ffff8802e2351fd8 ffff8802e211e2a0 ffff8802e211e298
 0000000000013280 ffff8802e2351fd8 0000000000013280 0000000000013280
Call Trace:
 [<ffffffff814122b0>] __mutex_lock_common+0x212/0x3bc
 [<ffffffff8141246e>] __mutex_lock_slowpath+0x14/0x16
 [<ffffffff81412549>] mutex_lock+0x27/0x3e
 [<ffffffffa01b3311>] nfs_llseek_dir+0x51/0x9e [nfs]
 [<ffffffff810ea249>] vfs_llseek+0x2e/0x30
 [<ffffffff810ea36c>] sys_lseek+0x3e/0x5d
 [<ffffffff8100aad2>] system_call_fastpath+0x16/0x1b
INFO: task btserver:20572 blocked for more than 180 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
btserver        D 0000000000000000     0 20572   2020 0x00000000
 ffff8802e5365ca8 0000000000000086 0000000000000001 0000000000000000
 ffff8802e573ac40 ffff8802e5365fd8 ffff8802e573af00 ffff8802e573aef8
 0000000000013280 ffff8802e5365fd8 0000000000013280 0000000000013280
Call Trace:
 [<ffffffff814122b0>] __mutex_lock_common+0x212/0x3bc
 [<ffffffff8141246e>] __mutex_lock_slowpath+0x14/0x16
 [<ffffffff81412549>] mutex_lock+0x27/0x3e
 [<ffffffff810f4f43>] do_last+0xb1/0x2bf
 [<ffffffff810f6d47>] do_filp_open+0x2c1/0x655
 [<ffffffff810da8a9>] ? __bit_spin_unlock.clone.1+0x1d/0x38
 [<ffffffff8103fde4>] ? get_parent_ip+0x11/0x42
 [<ffffffff814163ac>] ? sub_preempt_count+0x92/0xa5
 [<ffffffff81100a10>] ? alloc_fd+0x111/0x123
 [<ffffffff810e91c3>] do_sys_open+0x5b/0xed
 [<ffffffff8100bcaf>] ? math_state_restore+0x49/0x4b
 [<ffffffff810e927e>] sys_open+0x1b/0x1d
 [<ffffffff8100aad2>] system_call_fastpath+0x16/0x1b
INFO: task btserver:20583 blocked for more than 180 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
btserver        D 0000000000000002     0 20583   2020 0x00000000
 ffff8802e50dbca8 0000000000000082 0000000000000001 00000000ffffffff
 ffff880305bfb3a0 ffff8802e50dbfd8 ffff880305bfb660 ffff880305bfb658
 0000000000013280 ffff8802e50dbfd8 0000000000013280 0000000000013280
Call Trace:
 [<ffffffff814122b0>] __mutex_lock_common+0x212/0x3bc
 [<ffffffff8141246e>] __mutex_lock_slowpath+0x14/0x16
 [<ffffffff81412549>] mutex_lock+0x27/0x3e
 [<ffffffff810f4f43>] do_last+0xb1/0x2bf
 [<ffffffff810f6d47>] do_filp_open+0x2c1/0x655
 [<ffffffff8103fde4>] ? get_parent_ip+0x11/0x42
 [<ffffffff814163ac>] ? sub_preempt_count+0x92/0xa5
 [<ffffffff81100a10>] ? alloc_fd+0x111/0x123
 [<ffffffff810e91c3>] do_sys_open+0x5b/0xed
 [<ffffffff8100bcaf>] ? math_state_restore+0x49/0x4b
 [<ffffffff810e927e>] sys_open+0x1b/0x1d
 [<ffffffff8100aad2>] system_call_fastpath+0x16/0x1b
INFO: task btserver:20584 blocked for more than 180 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
btserver        D ffff8802e53ac740     0 20584   2020 0x00000000
 ffff8802e5407ca8 0000000000000086 0000000000000001 0000000000000001
 ffff880305bf8760 ffff8802e5407fd8 ffff880305bf8a20 ffff880305bf8a18
 0000000000013280 ffff8802e5407fd8 0000000000013280 0000000000013280
Call Trace:
 [<ffffffff8103fde4>] ? get_parent_ip+0x11/0x42
 [<ffffffff814122b0>] __mutex_lock_common+0x212/0x3bc
 [<ffffffff8141246e>] __mutex_lock_slowpath+0x14/0x16
 [<ffffffff81412549>] mutex_lock+0x27/0x3e
 [<ffffffff810f4f43>] do_last+0xb1/0x2bf
 [<ffffffff810f6d47>] do_filp_open+0x2c1/0x655
 [<ffffffff810da8a9>] ? __bit_spin_unlock.clone.1+0x1d/0x38
 [<ffffffff8103fde4>] ? get_parent_ip+0x11/0x42
 [<ffffffff814163ac>] ? sub_preempt_count+0x92/0xa5
 [<ffffffff81100a10>] ? alloc_fd+0x111/0x123
 [<ffffffff810e91c3>] do_sys_open+0x5b/0xed
 [<ffffffff8100bcaf>] ? math_state_restore+0x49/0x4b
 [<ffffffff810e927e>] sys_open+0x1b/0x1d
 [<ffffffff8100aad2>] system_call_fastpath+0x16/0x1b
INFO: task btserver:20587 blocked for more than 180 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
btserver        D 0000000000000004     0 20587   2020 0x00000000
 ffff8802e5373ca8 0000000000000086 0000000000000001 0000000000000000
 ffff88030474f600 ffff8802e5373fd8 ffff88030474f8c0 ffff88030474f8b8
 0000000000013280 ffff8802e5373fd8 0000000000013280 0000000000013280
Call Trace:
 [<ffffffff814122b0>] __mutex_lock_common+0x212/0x3bc
 [<ffffffff8141246e>] __mutex_lock_slowpath+0x14/0x16
 [<ffffffff81412549>] mutex_lock+0x27/0x3e
 [<ffffffff810f4f43>] do_last+0xb1/0x2bf
 [<ffffffff810f6d47>] do_filp_open+0x2c1/0x655
 [<ffffffff810da8a9>] ? __bit_spin_unlock.clone.1+0x1d/0x38
 [<ffffffff8103fde4>] ? get_parent_ip+0x11/0x42
 [<ffffffff814163ac>] ? sub_preempt_count+0x92/0xa5
 [<ffffffff81100a10>] ? alloc_fd+0x111/0x123
 [<ffffffff810e91c3>] do_sys_open+0x5b/0xed
 [<ffffffff8100bcaf>] ? math_state_restore+0x49/0x4b
 [<ffffffff810e927e>] sys_open+0x1b/0x1d
 [<ffffffff8100aad2>] system_call_fastpath+0x16/0x1b
INFO: task btserver:23670 blocked for more than 180 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
btserver        D 0000000000000007     0 23670   2020 0x00000000
 ffff8802e2351e38 0000000000000082 ffff8803046bd580 ffff880000000000
 ffff8802e211dfe0 ffff8802e2351fd8 ffff8802e211e2a0 ffff8802e211e298
 0000000000013280 ffff8802e2351fd8 0000000000013280 0000000000013280
Call Trace:
 [<ffffffff814122b0>] __mutex_lock_common+0x212/0x3bc
 [<ffffffff8141246e>] __mutex_lock_slowpath+0x14/0x16
 [<ffffffff81412549>] mutex_lock+0x27/0x3e
 [<ffffffffa01b3311>] nfs_llseek_dir+0x51/0x9e [nfs]
 [<ffffffff810ea249>] vfs_llseek+0x2e/0x30
 [<ffffffff810ea36c>] sys_lseek+0x3e/0x5d
 [<ffffffff8100aad2>] system_call_fastpath+0x16/0x1b
 connection5:0: detected conn error (1020)
 connection3:0: detected conn error (1020)


--
Ben Greear <greearb@xxxxxxxxxxxxxxx>
Candela Technologies Inc  http://www.candelatech.com

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux