Hi,
I upgraded the kernel to the latest stable with debugging enabled
(4.5.1) without any luck, this is what is outputted in dmesg:
[262448.558983] INFO: task php:13376 blocked for more than 120 seconds.
[262448.559057] Tainted: G W 4.5.1 #1
[262448.559092] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[262448.559246] php D
ffff88001c297a18
0 13376 12277 0x00000000
[262448.559519] ffff88001c297a18
ffff881ff248c100
ffff880013e9b400
ffff881fea472000
[262448.559603] ffff88001c297ae8
ffff88001c298000
ffff881c5cac1b30
ffff880013e9b400
[262448.560046] 0000000000020001
0000000545ea7820
ffff88001c297a30
ffffffff814d5690
[262448.560485] Call Trace:
[262448.560541] [<ffffffff814d5690>] schedule+0x30/0x80
[262448.560761] [<ffffffff814d823e>] schedule_timeout+0x21e/0x2a0
[262448.560828] [<ffffffff81217c3d>] ?
xfs_bmap_search_extents+0x7d/0x100
[262448.561000] [<ffffffff810902d9>] ? down_trylock+0x29/0x40
[262448.561135] [<ffffffff814d726f>] __down+0x5f/0xa0
[262448.561268] [<ffffffff8124bdd6>] ? _xfs_buf_find+0x156/0x350
[262448.561347] [<ffffffff8109032c>] down+0x3c/0x50
[262448.561390] [<ffffffff8124bbc7>] xfs_buf_lock+0x37/0xf0
[262448.561435] [<ffffffff8124bdd6>] _xfs_buf_find+0x156/0x350
[262448.561557] [<ffffffff8124bff5>] xfs_buf_get_map+0x25/0x280
[262448.561603] [<ffffffff81268f4b>] ? kmem_zone_alloc+0x7b/0x120
[262448.561666] [<ffffffff8124cbe8>] xfs_buf_read_map+0x28/0x180
[262448.561768] [<ffffffff8127830b>] xfs_trans_read_buf_map+0xeb/0x300
[262448.561809] [<ffffffff8123f7da>] xfs_imap_to_bp+0x5a/0xc0
[262448.561881] [<ffffffff8125b7a5>] xfs_iunlink_remove+0x275/0x3a0
[262448.561943] [<ffffffff81268f4b>] ? kmem_zone_alloc+0x7b/0x120
[262448.561988] [<ffffffff8125ec33>] xfs_ifree+0x33/0xd0
[262448.562033] [<ffffffff8125ed85>] xfs_inactive_ifree+0xb5/0x200
[262448.562109] [<ffffffff8125ef58>] xfs_inactive+0x88/0x110
[262448.562296] [<ffffffff81263f31>] xfs_fs_evict_inode+0xc1/0x110
[262448.562344] [<ffffffff811a42fb>] evict+0xbb/0x180
[262448.562405] [<ffffffff811a4bb3>] iput+0x193/0x200
[262448.562483] [<ffffffff811a08d2>] d_delete+0x122/0x160
[262448.562520] [<ffffffff81195b99>] vfs_rmdir+0xf9/0x120
[262448.562559] [<ffffffff81199d17>] do_rmdir+0x1b7/0x1d0
[262448.562607] [<ffffffff81001210>] ? exit_to_usermode_loop+0x90/0xb0
[262448.562665] [<ffffffff8119a921>] SyS_rmdir+0x11/0x20
[262448.562891] [<ffffffff814d8f1b>]
entry_SYSCALL_64_fastpath+0x16/0x6e
[262489.707201] NMI watchdog: Watchdog detected hard LOCKUP on cpu 15
[262489.707227] Modules linked in:
ipt_MASQUERADE
nf_nat_masquerade_ipv4
iptable_nat
nf_conntrack_ipv4
nf_defrag_ipv4
nf_nat_ipv4
nf_nat
nf_conntrack
ipt_REJECT
nf_reject_ipv4
iptable_mangle
netconsole
configfs
tun
xt_multiport
ip6table_filter
ip6_tables
iptable_filter
ip_tables
x_tables
bridge
stp
llc
bonding
ext4
crc16
mbcache
jbd2
raid1
raid0
raid456
async_raid6_recov
async_memcpy
async_pq
async_xor
xor
async_tx
raid6_pq
md_mod
sg
sd_mod
hid_generic
usbhid
hid
x86_pkg_temp_thermal
coretemp
crct10dif_pclmul
crc32_pclmul
crc32c_intel
ghash_clmulni_intel
jitterentropy_rng
sha256_ssse3
iTCO_wdt
sha256_generic
iTCO_vendor_support
hmac
drbg
xhci_pci
ahci
sb_edac
ehci_pci
ansi_cprng
xhci_hcd
ehci_hcd
libahci
i2c_i801
edac_core
lpc_ich
mei_me
mfd_core
libata
usbcore
igb
mei
megaraid_sas
i2c_algo_bit
usb_common
ptp
aesni_intel
pps_core
aes_x86_64
ioatdma
lrw
gf128mul
glue_helper
ablk_helper
i2c_core
scsi_mod
dca
cryptd
ipmi_si
ipmi_msghandler
acpi_power_meter
tpm_tis
tpm
processor
button
[262489.708066] CPU: 15 PID: 17535 Comm: kworker/u32:6 Tainted:
G W 4.5.1 #1
[262489.708124] Hardware name: Supermicro Super Server/X10DRi-LN4+,
BIOS 2.0 12/17/2015
[262489.708187] Workqueue: writeback wb_workfn
(flush-9:7)
[262489.708228] 0000000000000000
ffff88207fde5bd0
ffffffff812e00b8
0000000000000000
[262489.708298] 0000000000000000
ffff88207fde5be8
ffffffff810dff1d
ffff881ff2270000
[262489.708368] ffff88207fde5c20
ffffffff8110f8f8
0000000000000001
ffff88207fdeaf00
[262489.708438] Call Trace:
[262489.708467] <NMI>
[<ffffffff812e00b8>] dump_stack+0x4d/0x65
[262489.708512] [<ffffffff810dff1d>]
watchdog_overflow_callback+0xdd/0xf0
[262489.708552] [<ffffffff8110f8f8>] __perf_event_overflow+0x88/0x1d0
[262489.708589] [<ffffffff811103e4>] perf_event_overflow+0x14/0x20
[262489.708627] [<ffffffff8101e320>] intel_pmu_handle_irq+0x1d0/0x4a0
[262489.708666] [<ffffffff81155481>] ? vunmap_page_range+0x1a1/0x310
[262489.708703] [<ffffffff811555fc>] ?
unmap_kernel_range_noflush+0xc/0x10
[262489.708748] [<ffffffff8135a543>] ?
ghes_copy_tofrom_phys+0x113/0x1e0
[262489.708788] [<ffffffff810359da>] ?
native_apic_wait_icr_idle+0x1a/0x30
[262489.708827] [<ffffffff810096e0>] ? arch_irq_work_raise+0x30/0x40
[262489.708865] [<ffffffff810162d8>] perf_event_nmi_handler+0x28/0x50
[262489.708902] [<ffffffff81008121>] nmi_handle+0x61/0x110
[262489.708939] [<ffffffff810082e7>] do_nmi+0x117/0x3e0
[262489.708975] [<ffffffff814dae97>] end_repeat_nmi+0x1a/0x1e
[262489.709013] [<ffffffffa01d05f0>] ? raid5_unplug+0x70/0x130
[raid456]
[262489.709051] [<ffffffffa01d05f0>] ? raid5_unplug+0x70/0x130
[raid456]
[262489.709089] [<ffffffffa01d05f0>] ? raid5_unplug+0x70/0x130
[raid456]
[262489.709125] <<EOE>>
[<ffffffff812b9b98>] blk_flush_plug_list+0xa8/0x210
[262489.709169] [<ffffffff814d5de0>] ? bit_wait_timeout+0x70/0x70
[262489.709206] [<ffffffff814d4c04>] io_schedule_timeout+0x54/0x130
[262489.709242] [<ffffffff814d5df6>] bit_wait_io+0x16/0x60
[262489.709277] [<ffffffff814d5b59>] __wait_on_bit_lock+0x49/0xa0
[262489.709314] [<ffffffff81117fd0>] __lock_page+0xb0/0xc0
[262489.709352] [<ffffffff8108bdc0>] ?
autoremove_wake_function+0x30/0x30
[262489.709391] [<ffffffff811250f0>] write_cache_pages+0x2f0/0x4d0
[262489.709427] [<ffffffff81122df0>] ? wb_position_ratio+0x1f0/0x1f0
[262489.709465] [<ffffffff8112530e>] generic_writepages+0x3e/0x60
[262489.709502] [<ffffffff81244c18>] xfs_vm_writepages+0x38/0x40
[262489.709539] [<ffffffff81125e29>] do_writepages+0x19/0x30
[262489.709574] [<ffffffff811b5c50>]
__writeback_single_inode+0x40/0x310
[262489.709612] [<ffffffff811b6402>] writeback_sb_inodes+0x242/0x520
[262489.709649] [<ffffffff811b676a>] __writeback_inodes_wb+0x8a/0xc0
[262489.709686] [<ffffffff811b6a77>] wb_writeback+0x247/0x2d0
[262489.709721] [<ffffffff811b716f>] wb_workfn+0x20f/0x3c0
[262489.709758] [<ffffffff81067513>] process_one_work+0x143/0x400
[262489.709795] [<ffffffff81067cc1>] worker_thread+0x61/0x490
[262489.709831] [<ffffffff81067c60>] ? max_active_store+0x60/0x60
[262489.709867] [<ffffffff8106c926>] kthread+0xd6/0xf0
[262489.709901] [<ffffffff8106c850>] ? kthread_park+0x50/0x50
[262489.709937] [<ffffffff814d92af>] ret_from_fork+0x3f/0x70
[262489.709972] [<ffffffff8106c850>] ? kthread_park+0x50/0x50
[262491.022971] NMI watchdog: Watchdog detected hard LOCKUP on cpu 0
[262491.023470] Modules linked in:
ipt_MASQUERADE
nf_nat_masquerade_ipv4
iptable_nat
nf_conntrack_ipv4
nf_defrag_ipv4
nf_nat_ipv4
nf_nat
nf_conntrack
ipt_REJECT
nf_reject_ipv4
iptable_mangle
netconsole
configfs
tun
xt_multiport
ip6table_filter
ip6_tables
iptable_filter
ip_tables
x_tables
bridge
stp
llc
bonding
ext4
crc16
mbcache
jbd2
raid1
raid0
raid456
async_raid6_recov
async_memcpy
async_pq
async_xor
xor
async_tx
raid6_pq
md_mod
sg
sd_mod
hid_generic
usbhid
hid
x86_pkg_temp_thermal
coretemp
crct10dif_pclmul
crc32_pclmul
crc32c_intel
ghash_clmulni_intel
jitterentropy_rng
sha256_ssse3
iTCO_wdt
sha256_generic
iTCO_vendor_support
hmac
drbg
xhci_pci
ahci
sb_edac
ehci_pci
ansi_cprng
xhci_hcd
ehci_hcd
libahci
i2c_i801
edac_core
lpc_ich
mei_me
mfd_core
libata
usbcore
igb
mei
megaraid_sas
i2c_algo_bit
usb_common
ptp
aesni_intel
pps_core
aes_x86_64
ioatdma
lrw
gf128mul
glue_helper
ablk_helper
i2c_core
scsi_mod
dca
cryptd
ipmi_si
ipmi_msghandler
acpi_power_meter
tpm_tis
tpm
processor
button
[262491.029705] CPU: 0 PID: 1178 Comm: md7_raid5 Tainted: G
W 4.5.1 #1
[262491.029776] Hardware name: Supermicro Super Server/X10DRi-LN4+,
BIOS 2.0 12/17/2015
[262491.029849] 0000000000000000
ffff88207fc05bd0
ffffffff812e00b8
0000000000000000
[262491.029988] 0000000000000000
ffff88207fc05be8
ffffffff810dff1d
ffff881fff032000
[262491.030124] ffff88207fc05c20
ffffffff8110f8f8
0000000000000001
ffff88207fc0af00
[262491.030260] Call Trace:
[262491.030302] <NMI>
[<ffffffff812e00b8>] dump_stack+0x4d/0x65
[262491.030377] [<ffffffff810dff1d>]
watchdog_overflow_callback+0xdd/0xf0
[262491.030432] [<ffffffff8110f8f8>] __perf_event_overflow+0x88/0x1d0
[262491.030484] [<ffffffff811103e4>] perf_event_overflow+0x14/0x20
[262491.030536] [<ffffffff8101e320>] intel_pmu_handle_irq+0x1d0/0x4a0
[262491.030589] [<ffffffff81155481>] ? vunmap_page_range+0x1a1/0x310
[262491.030640] [<ffffffff811555fc>] ?
unmap_kernel_range_noflush+0xc/0x10
[262491.030693] [<ffffffff8135a543>] ?
ghes_copy_tofrom_phys+0x113/0x1e0
[262491.030745] [<ffffffff8135a681>] ? ghes_read_estatus+0x71/0x140
[262491.030797] [<ffffffff810162d8>] perf_event_nmi_handler+0x28/0x50
[262491.030849] [<ffffffff81008121>] nmi_handle+0x61/0x110
[262491.030898] [<ffffffff810083d1>] do_nmi+0x201/0x3e0
[262491.030949] [<ffffffff814dae97>] end_repeat_nmi+0x1a/0x1e
[262491.030998] [<ffffffff81090d23>] ?
queued_spin_lock_slowpath+0x153/0x170
[262491.031050] [<ffffffff81090d23>] ?
queued_spin_lock_slowpath+0x153/0x170
[262491.031102] [<ffffffff81090d23>] ?
queued_spin_lock_slowpath+0x153/0x170
[262491.031153] <<EOE>>
[<ffffffff814d8c6c>] _raw_spin_lock_irq+0x1c/0x20
[262491.031225] [<ffffffffa01db6b1>] raid5d+0x91/0x720 [raid456]
[262491.031276] [<ffffffff810a4a8a>] ? try_to_del_timer_sync+0x4a/0x60
[262491.031328] [<ffffffff810a4ae3>] ? del_timer_sync+0x43/0x50
[262491.031377] [<ffffffff814d816e>] ? schedule_timeout+0x14e/0x2a0
[262491.031428] [<ffffffff810a4830>] ?
trace_event_raw_event_tick_stop+0x100/0x100
[262491.031502] [<ffffffffa017874b>] md_thread+0x12b/0x130 [md_mod]
[262491.031555] [<ffffffff8108bd90>] ? wait_woken+0x80/0x80
[262491.031605] [<ffffffffa0178620>] ? find_pers+0x70/0x70 [md_mod]
[262491.031656] [<ffffffff8106c926>] kthread+0xd6/0xf0
[262491.031704] [<ffffffff8106c850>] ? kthread_park+0x50/0x50
[262491.031753] [<ffffffff814d92af>] ret_from_fork+0x3f/0x70
[262491.031802] [<ffffffff8106c850>] ? kthread_park+0x50/0x50
[262491.031753] [<ffffffff814d92af>] ret_from_fork+0x3f/0x70
[262491.031802] [<ffffffff8106c850>] ? kthread_park+0x50/0x50
The server is hosting plain VPS's, there's a few that use it for
rtorrent which is quite disk extenssive, but from what I can see that
iowait is quite low.
There's absolutely nothing logged at all before the lockups, everythings
running fine and then suddenly it just crashes, im beginning to think we
might have a hardware problem, but im having a hard time finding the
actual issue.
Any ideas?
Best regards
Den 13-04-2016 kl. 19:00 skrev Shaohua Li:
Looks there is a deadlock trying to hold the device_lock or hash_lock. anything
abormal print out before the NMI watchdog? What is running in the machine?
Looks this is old kernel, is it possible you can try a latest kernel and report
back?
Thanks,
Shaohua
On Tue, Apr 12, 2016 at 09:54:08PM +0000, Daniel Walker wrote:
Im having some issues on a brand new Supermicro server that we have running
in production along side a few other machines which are identical to this
server..
The output from the netconsole attached to the server is here:
Apr 12 21:34:45 [75704.964946] NMI watchdog: Watchdog detected hard LOCKUP
on cpu 6
Apr 12 21:34:45
Apr 12 21:34:45 [75704.964973] Modules linked in:
Apr 12 21:34:45 ipt_REJECT
Apr 12 21:34:45 nf_reject_ipv4
Apr 12 21:34:45 iptable_mangle
Apr 12 21:34:45 tun
Apr 12 21:34:45 netconsole
Apr 12 21:34:45 configfs
Apr 12 21:34:45 xt_multiport
Apr 12 21:34:45 ip6table_filter
Apr 12 21:34:45 ip6_tables
Apr 12 21:34:45 iptable_filter
Apr 12 21:34:45 ip_tables
Apr 12 21:34:45 x_tables
Apr 12 21:34:45 bridge
Apr 12 21:34:45 stp
Apr 12 21:34:45 llc
Apr 12 21:34:45 bonding
Apr 12 21:34:45 ext4
Apr 12 21:34:45 crc16
Apr 12 21:34:45 mbcache
Apr 12 21:34:45 jbd2
Apr 12 21:34:45 raid1
Apr 12 21:34:45 raid0
Apr 12 21:34:45 raid456
Apr 12 21:34:45 async_raid6_recov
Apr 12 21:34:45 async_memcpy
Apr 12 21:34:45 async_pq
Apr 12 21:34:45 async_xor
Apr 12 21:34:45 xor
Apr 12 21:34:45 async_tx
Apr 12 21:34:45 raid6_pq
Apr 12 21:34:45 md_mod
Apr 12 21:34:45 sr_mod
Apr 12 21:34:45 cdrom
Apr 12 21:34:45 usb_storage
Apr 12 21:34:45 hid_generic
Apr 12 21:34:45 usbhid
Apr 12 21:34:45 hid
Apr 12 21:34:45 sg
Apr 12 21:34:45 sd_mod
Apr 12 21:34:45 x86_pkg_temp_thermal
Apr 12 21:34:45 coretemp
Apr 12 21:34:45 crct10dif_pclmul
Apr 12 21:34:45 crc32_pclmul
Apr 12 21:34:45 crc32c_intel
Apr 12 21:34:45 jitterentropy_rng
Apr 12 21:34:45 sha256_ssse3
Apr 12 21:34:45 sha256_generic
Apr 12 21:34:45 hmac
Apr 12 21:34:45 iTCO_wdt
Apr 12 21:34:45 iTCO_vendor_support
Apr 12 21:34:45 drbg
Apr 12 21:34:45 ansi_cprng
Apr 12 21:34:45 aesni_intel
Apr 12 21:34:45 aes_x86_64
Apr 12 21:34:45 lrw
Apr 12 21:34:45 gf128mul
Apr 12 21:34:45 glue_helper
Apr 12 21:34:45 ablk_helper
Apr 12 21:34:45 cryptd
Apr 12 21:34:45 ahci
Apr 12 21:34:45 libahci
Apr 12 21:34:45 sb_edac
Apr 12 21:34:45 libata
Apr 12 21:34:45 igb
Apr 12 21:34:45 megaraid_sas
Apr 12 21:34:45 xhci_pci
Apr 12 21:34:45 ehci_pci
Apr 12 21:34:45 i2c_algo_bit
Apr 12 21:34:45 xhci_hcd
Apr 12 21:34:45 ehci_hcd
Apr 12 21:34:45 edac_core
Apr 12 21:34:45 ptp
Apr 12 21:34:45 mei_me
Apr 12 21:34:45 lpc_ich
Apr 12 21:34:45 i2c_i801
Apr 12 21:34:45 usbcore
Apr 12 21:34:45 pps_core
Apr 12 21:34:45 mfd_core
Apr 12 21:34:45 mei
Apr 12 21:34:45 usb_common
Apr 12 21:34:45 i2c_core
Apr 12 21:34:45 ioatdma
Apr 12 21:34:45 scsi_mod
Apr 12 21:34:45 dca
Apr 12 21:34:45 ipmi_si
Apr 12 21:34:45 ipmi_msghandler
Apr 12 21:34:45 acpi_power_meter
Apr 12 21:34:45 tpm_tis
Apr 12 21:34:45 tpm
Apr 12 21:34:45 processor
Apr 12 21:34:45 button
Apr 12 21:34:45
Apr 12 21:34:45 [75704.965874] CPU: 6 PID: 25339 Comm: main Not tainted
4.4.1 #2
Apr 12 21:34:45 [75704.965916] Hardware name: Supermicro Super
Server/X10DRi-LN4+, BIOS 2.0 12/17/2015
Apr 12 21:34:45 [75704.965979] 0000000000000000
Apr 12 21:34:45 ffffffff812abdf3
Apr 12 21:34:45 0000000000000000
Apr 12 21:34:45 ffffffff810cf5f5
Apr 12 21:34:45
Apr 12 21:34:45 [75704.966054] ffff881ff2870000
Apr 12 21:34:45 ffffffff810fcea2
Apr 12 21:34:45 0000000000000001
Apr 12 21:34:45 ffff881fffcc5e58
Apr 12 21:34:45
Apr 12 21:34:45 [75704.966134] ffff881fffccaf00
Apr 12 21:34:45 ffff881fffccb100
Apr 12 21:34:45 ffff881ff2870000
Apr 12 21:34:45 ffffffff8101bc63
Apr 12 21:34:45
Apr 12 21:34:45 [75704.966211] Call Trace:
Apr 12 21:34:45 [75704.966246] <NMI>
Apr 12 21:34:45 [<ffffffff812abdf3>] ? dump_stack+0x40/0x5d
Apr 12 21:34:45 [75704.966297] [<ffffffff810cf5f5>] ?
watchdog_overflow_callback+0xb5/0xd0
Apr 12 21:34:45 [75704.966339] [<ffffffff810fcea2>] ?
__perf_event_overflow+0x82/0x1c0
Apr 12 21:34:45 [75704.966384] [<ffffffff8101bc63>] ?
intel_pmu_handle_irq+0x1c3/0x3e0
Apr 12 21:34:45 [75704.966431] [<ffffffff8113b5cb>] ?
vunmap_page_range+0x1bb/0x320
Apr 12 21:34:45 [75704.966474] [<ffffffff813213e0>] ?
ghes_copy_tofrom_phys+0x110/0x1d0
Apr 12 21:34:45 [75704.966519] [<ffffffff81014f53>] ?
perf_event_nmi_handler+0x23/0x40
Apr 12 21:34:45 [75704.966560] [<ffffffff81007b85>] ?
nmi_handle+0x65/0x100
Apr 12 21:34:45 [75704.966597] [<ffffffff81007dfe>] ? do_nmi+0x1de/0x360
Apr 12 21:34:45 [75704.970603] [<ffffffff8148f957>] ?
end_repeat_nmi+0x1a/0x1e
Apr 12 21:34:45 [75704.970644] [<ffffffff810862ca>] ?
queued_spin_lock_slowpath+0xea/0x150
Apr 12 21:34:45 [75704.970685] [<ffffffff810862ca>] ?
queued_spin_lock_slowpath+0xea/0x150
Apr 12 21:34:45 [75704.970728] [<ffffffff810862ca>] ?
queued_spin_lock_slowpath+0xea/0x150
Apr 12 21:34:45 [75704.970768] <<EOE>>
Apr 12 21:34:45 [<ffffffffa01b413b>] ? make_request+0x60b/0xbd0 [raid456]
Apr 12 21:34:45 [75704.970838] [<ffffffff810815c0>] ? wait_woken+0x80/0x80
Apr 12 21:34:45 [75704.970878] [<ffffffff81151ec4>] ?
kmem_cache_alloc+0xf4/0x120
Apr 12 21:34:45 [75704.970922] [<ffffffffa017632d>] ?
md_make_request+0xdd/0x220 [md_mod]
Apr 12 21:34:45 [75704.970969] [<ffffffff81219fde>] ?
xfs_map_buffer.isra.12+0x2e/0x60
Apr 12 21:34:45 [75704.971012] [<ffffffff8128691d>] ?
generic_make_request+0xed/0x1d0
Apr 12 21:34:45 [75704.971052] [<ffffffff81286a5a>] ?
submit_bio+0x5a/0x140
Apr 12 21:34:45 [75704.971098] [<ffffffff81113379>] ?
release_pages+0xc9/0x270
Apr 12 21:34:45 [75704.971145] [<ffffffff811a2c01>] ?
do_mpage_readpage+0x2d1/0x640
Apr 12 21:34:45 [75704.971187] [<ffffffff811a304d>] ?
mpage_readpages+0xdd/0x130
Apr 12 21:34:45 [75704.971226] [<ffffffff8121b510>] ?
__xfs_get_blocks+0x750/0x750
Apr 12 21:34:45 [75704.971267] [<ffffffff8121b510>] ?
__xfs_get_blocks+0x750/0x750
Apr 12 21:34:45 [75704.971313] [<ffffffff8114ad45>] ?
alloc_pages_current+0x85/0x110
Apr 12 21:34:45 [75704.971354] [<ffffffff81111d25>] ?
__do_page_cache_readahead+0x165/0x1f0
Apr 12 21:34:45 [75704.971399] [<ffffffff81105902>] ?
pagecache_get_page+0x22/0x1a0
Apr 12 21:34:45 [75704.971441] [<ffffffff8110768c>] ?
filemap_fault+0x37c/0x400
Apr 12 21:34:45 [75704.971481] [<ffffffff8122474b>] ?
xfs_filemap_fault+0x3b/0x80
Apr 12 21:34:45 [75704.971526] [<ffffffff8112d2da>] ? __do_fault+0x3a/0xc0
Apr 12 21:34:45 [75704.971564] [<ffffffff81130883>] ?
handle_mm_fault+0x1063/0x1650
Apr 12 21:34:45 [75704.971614] [<ffffffff8103bdae>] ?
__do_page_fault+0x11e/0x370
Apr 12 21:34:45 [75704.971653] [<ffffffff811aa4ff>] ?
SyS_epoll_wait+0x8f/0xd0
Apr 12 21:34:45 [75704.971694] [<ffffffff8148f64f>] ? page_fault+0x1f/0x30
Apr 12 21:34:45 [75705.493640] NMI watchdog: Watchdog detected hard LOCKUP
on cpu 12
Apr 12 21:34:45
Apr 12 21:34:45 [75705.493668] Modules linked in:
Apr 12 21:34:45 ipt_REJECT
Apr 12 21:34:45 nf_reject_ipv4
Apr 12 21:34:45 iptable_mangle
Apr 12 21:34:45 tun
Apr 12 21:34:45 netconsole
Apr 12 21:34:45 configfs
Apr 12 21:34:45 xt_multiport
Apr 12 21:34:45 ip6table_filter
Apr 12 21:34:45 ip6_tables
Apr 12 21:34:45 iptable_filter
Apr 12 21:34:45 ip_tables
Apr 12 21:34:45 x_tables
Apr 12 21:34:45 bridge
Apr 12 21:34:45 stp
Apr 12 21:34:45 llc
Apr 12 21:34:45 bonding
Apr 12 21:34:45 ext4
Apr 12 21:34:45 crc16
Apr 12 21:34:45 mbcache
Apr 12 21:34:45 jbd2
Apr 12 21:34:45 raid1
Apr 12 21:34:45 raid0
Apr 12 21:34:45 raid456
Apr 12 21:34:45 async_raid6_recov
Apr 12 21:34:45 async_memcpy
Apr 12 21:34:45 async_pq
Apr 12 21:34:45 async_xor
Apr 12 21:34:45 xor
Apr 12 21:34:45 async_tx
Apr 12 21:34:45 raid6_pq
Apr 12 21:34:45 md_mod
Apr 12 21:34:45 sr_mod
Apr 12 21:34:45 cdrom
Apr 12 21:34:45 usb_storage
Apr 12 21:34:45 hid_generic
Apr 12 21:34:45 usbhid
Apr 12 21:34:45 hid
Apr 12 21:34:45 sg
Apr 12 21:34:45 sd_mod
Apr 12 21:34:45 x86_pkg_temp_thermal
Apr 12 21:34:45 coretemp
Apr 12 21:34:45 crct10dif_pclmul
Apr 12 21:34:45 crc32_pclmul
Apr 12 21:34:45 crc32c_intel
Apr 12 21:34:45 jitterentropy_rng
Apr 12 21:34:45 sha256_ssse3
Apr 12 21:34:45 sha256_generic
Apr 12 21:34:45 hmac
Apr 12 21:34:45 iTCO_wdt
Apr 12 21:34:45 iTCO_vendor_support
Apr 12 21:34:45 drbg
Apr 12 21:34:45 ansi_cprng
Apr 12 21:34:45 aesni_intel
Apr 12 21:34:45 aes_x86_64
Apr 12 21:34:45 lrw
Apr 12 21:34:45 gf128mul
Apr 12 21:34:45 glue_helper
Apr 12 21:34:45 ablk_helper
Apr 12 21:34:45 cryptd
Apr 12 21:34:45 ahci
Apr 12 21:34:45 libahci
Apr 12 21:34:45 sb_edac
Apr 12 21:34:45 libata
Apr 12 21:34:45 igb
Apr 12 21:34:45 megaraid_sas
Apr 12 21:34:45 xhci_pci
Apr 12 21:34:45 ehci_pci
Apr 12 21:34:45 i2c_algo_bit
Apr 12 21:34:45 xhci_hcd
Apr 12 21:34:45 ehci_hcd
Apr 12 21:34:45 edac_core
Apr 12 21:34:45 ptp
Apr 12 21:34:45 mei_me
Apr 12 21:34:45 lpc_ich
Apr 12 21:34:45 i2c_i801
Apr 12 21:34:45 usbcore
Apr 12 21:34:45 pps_core
Apr 12 21:34:45 mfd_core
Apr 12 21:34:45 mei
Apr 12 21:34:45 usb_common
Apr 12 21:34:45 i2c_core
Apr 12 21:34:45 ioatdma
Apr 12 21:34:45 scsi_mod
Apr 12 21:34:45 dca
Apr 12 21:34:45 ipmi_si
Apr 12 21:34:45 ipmi_msghandler
Apr 12 21:34:45 acpi_power_meter
Apr 12 21:34:45 tpm_tis
Apr 12 21:34:45 tpm
Apr 12 21:34:45 processor
Apr 12 21:34:45 button
Apr 12 21:34:45
Apr 12 21:34:45 [75705.494688] CPU: 12 PID: 32350 Comm: main Not tainted
4.4.1 #2
Apr 12 21:34:45 [75705.494728] Hardware name: Supermicro Super
Server/X10DRi-LN4+, BIOS 2.0 12/17/2015
Apr 12 21:34:45 [75705.494790] 0000000000000000
Apr 12 21:34:45 ffffffff812abdf3
Apr 12 21:34:45 0000000000000000
Apr 12 21:34:45 ffffffff810cf5f5
Apr 12 21:34:45
Apr 12 21:34:45 [75705.494886] ffff883ff29a0000
Apr 12 21:34:45 ffffffff810fcea2
Apr 12 21:34:45 0000000000000001
Apr 12 21:34:45 ffff88407fc85e58
Apr 12 21:34:45
Apr 12 21:34:45 [75705.494976] ffff88407fc8af00
Apr 12 21:34:45 ffff88407fc8b100
Apr 12 21:34:45 ffff883ff29a0000
Apr 12 21:34:45 ffffffff8101bc63
Apr 12 21:34:45
Apr 12 21:34:45 [75705.495064] Call Trace:
Apr 12 21:34:45 [75705.495094] <NMI>
Apr 12 21:34:45 [<ffffffff812abdf3>] ? dump_stack+0x40/0x5d
Apr 12 21:34:45 [75705.495150] [<ffffffff810cf5f5>] ?
watchdog_overflow_callback+0xb5/0xd0
Apr 12 21:34:45 [75705.495193] [<ffffffff810fcea2>] ?
__perf_event_overflow+0x82/0x1c0
Apr 12 21:34:45 [75705.495237] [<ffffffff8101bc63>] ?
intel_pmu_handle_irq+0x1c3/0x3e0
Apr 12 21:34:45 [75705.495284] [<ffffffff8113b5cb>] ?
vunmap_page_range+0x1bb/0x320
Apr 12 21:34:45 [75705.495330] [<ffffffff813213e0>] ?
ghes_copy_tofrom_phys+0x110/0x1d0
Apr 12 21:34:45 [75705.495373] [<ffffffff81014f53>] ?
perf_event_nmi_handler+0x23/0x40
Apr 12 21:34:45 [75705.495418] [<ffffffff81007b85>] ?
nmi_handle+0x65/0x100
Apr 12 21:34:45 [75705.495458] [<ffffffff81007d2e>] ? do_nmi+0x10e/0x360
Apr 12 21:34:45 [75705.495497] [<ffffffff8148f957>] ?
end_repeat_nmi+0x1a/0x1e
Apr 12 21:34:45 [75705.495540] [<ffffffff810862ca>] ?
queued_spin_lock_slowpath+0xea/0x150
Apr 12 21:34:45 [75705.495581] [<ffffffff810862ca>] ?
queued_spin_lock_slowpath+0xea/0x150
Apr 12 21:34:45 [75705.495621] [<ffffffff810862ca>] ?
queued_spin_lock_slowpath+0xea/0x150
Apr 12 21:34:45 [75705.495661] <<EOE>>
Apr 12 21:34:45 [<ffffffffa01b413b>] ? make_request+0x60b/0xbd0 [raid456]
Apr 12 21:34:45 [75705.495733] [<ffffffff81282d87>] ?
blk_rq_init+0x87/0xa0
Apr 12 21:34:45 [75705.495771] [<ffffffff81283e3c>] ?
get_request+0x29c/0x6e0
Apr 12 21:34:45 [75705.495812] [<ffffffff810815c0>] ? wait_woken+0x80/0x80
Apr 12 21:34:45 [75705.495853] [<ffffffffa017632d>] ?
md_make_request+0xdd/0x220 [md_mod]
Apr 12 21:34:45 [75705.495898] [<ffffffff8128829e>] ?
blk_queue_bio+0x15e/0x350
Apr 12 21:34:45 [75705.495937] [<ffffffff8128691d>] ?
generic_make_request+0xed/0x1d0
Apr 12 21:34:45 [75705.495978] [<ffffffff81286a5a>] ?
submit_bio+0x5a/0x140
Apr 12 21:34:45 [75705.496018] [<ffffffff811a215e>] ?
mpage_bio_submit+0x1e/0x30
Apr 12 21:34:45 [75705.496057] [<ffffffff811a3076>] ?
mpage_readpages+0x106/0x130
Apr 12 21:34:45 [75705.496102] [<ffffffff8121b510>] ?
__xfs_get_blocks+0x750/0x750
Apr 12 21:34:45 [75705.496144] [<ffffffff8121b510>] ?
__xfs_get_blocks+0x750/0x750
Apr 12 21:34:45 [75705.496185] [<ffffffff8114ad45>] ?
alloc_pages_current+0x85/0x110
Apr 12 21:34:45 [75705.496227] [<ffffffff81111d25>] ?
__do_page_cache_readahead+0x165/0x1f0
Apr 12 21:34:45 [75705.496268] [<ffffffff811344f5>] ? vma_link+0x75/0xb0
Apr 12 21:34:45 [75705.496307] [<ffffffff811120eb>] ?
force_page_cache_readahead+0x9b/0xe0
Apr 12 21:34:45 [75705.496352] [<ffffffff8113f876>] ?
madvise_willneed+0x76/0x140
Apr 12 21:34:45 [75705.496395] [<ffffffff811301ce>] ?
handle_mm_fault+0x9ae/0x1650
Apr 12 21:34:45 [75705.496437] [<ffffffff81133dcb>] ? find_vma+0x5b/0x70
Apr 12 21:34:45 [75705.496476] [<ffffffff8113fc52>] ?
SyS_madvise+0x312/0x6f0
Apr 12 21:34:45 [75705.496515] [<ffffffff8148d9db>] ?
entry_SYSCALL_64_fastpath+0x16/0x6e
Apr 12 21:34:47 [75707.118049] NMI watchdog: Watchdog detected hard LOCKUP
on cpu 15
Apr 12 21:34:47
Apr 12 21:34:47 [75707.118078] Modules linked in:
Apr 12 21:34:47 ipt_REJECT
Apr 12 21:34:47 nf_reject_ipv4
Apr 12 21:34:47 iptable_mangle
Apr 12 21:34:47 tun
Apr 12 21:34:47 netconsole
Apr 12 21:34:47 configfs
Apr 12 21:34:47 xt_multiport
Apr 12 21:34:47 ip6table_filter
Apr 12 21:34:47 ip6_tables
Apr 12 21:34:47 iptable_filter
Apr 12 21:34:47 ip_tables
Apr 12 21:34:47 x_tables
Apr 12 21:34:47 bridge
Apr 12 21:34:47 stp
Apr 12 21:34:47 llc
Apr 12 21:34:47 bonding
Apr 12 21:34:47 ext4
Apr 12 21:34:47 crc16
Apr 12 21:34:47 mbcache
Apr 12 21:34:47 jbd2
Apr 12 21:34:47 raid1
Apr 12 21:34:47 raid0
Apr 12 21:34:47 raid456
Apr 12 21:34:47 async_raid6_recov
Apr 12 21:34:47 async_memcpy
Apr 12 21:34:47 async_pq
Apr 12 21:34:47 async_xor
Apr 12 21:34:47 xor
Apr 12 21:34:47 async_tx
Apr 12 21:34:47 raid6_pq
Apr 12 21:34:47 md_mod
Apr 12 21:34:47 sr_mod
Apr 12 21:34:47 cdrom
Apr 12 21:34:47 usb_storage
Apr 12 21:34:47 hid_generic
Apr 12 21:34:47 usbhid
Apr 12 21:34:47 hid
Apr 12 21:34:47 sg
Apr 12 21:34:47 sd_mod
Apr 12 21:34:47 x86_pkg_temp_thermal
Apr 12 21:34:47 coretemp
Apr 12 21:34:47 crct10dif_pclmul
Apr 12 21:34:47 crc32_pclmul
Apr 12 21:34:47 crc32c_intel
Apr 12 21:34:47 jitterentropy_rng
Apr 12 21:34:47 sha256_ssse3
Apr 12 21:34:47 sha256_generic
Apr 12 21:34:47 hmac
Apr 12 21:34:47 iTCO_wdt
Apr 12 21:34:47 iTCO_vendor_support
Apr 12 21:34:47 drbg
Apr 12 21:34:47 ansi_cprng
Apr 12 21:34:47 aesni_intel
Apr 12 21:34:47 aes_x86_64
Apr 12 21:34:47 lrw
Apr 12 21:34:47 gf128mul
Apr 12 21:34:47 glue_helper
Apr 12 21:34:47 ablk_helper
Apr 12 21:34:47 cryptd
Apr 12 21:34:47 ahci
Apr 12 21:34:47 libahci
Apr 12 21:34:47 sb_edac
Apr 12 21:34:47 libata
Apr 12 21:34:47 igb
Apr 12 21:34:47 megaraid_sas
Apr 12 21:34:47 xhci_pci
Apr 12 21:34:47 ehci_pci
Apr 12 21:34:47 i2c_algo_bit
Apr 12 21:34:47 xhci_hcd
Apr 12 21:34:47 ehci_hcd
Apr 12 21:34:47 edac_core
Apr 12 21:34:47 ptp
Apr 12 21:34:47 mei_me
Apr 12 21:34:47 lpc_ich
Apr 12 21:34:47 i2c_i801
Apr 12 21:34:47 usbcore
Apr 12 21:34:47 pps_core
Apr 12 21:34:47 mfd_core
Apr 12 21:34:47 mei
Apr 12 21:34:47 usb_common
Apr 12 21:34:47 i2c_core
Apr 12 21:34:47 ioatdma
Apr 12 21:34:47 scsi_mod
Apr 12 21:34:47 dca
Apr 12 21:34:47 ipmi_si
Apr 12 21:34:47 ipmi_msghandler
Apr 12 21:34:47 acpi_power_meter
Apr 12 21:34:47 tpm_tis
Apr 12 21:34:47 tpm
Apr 12 21:34:47 processor
Apr 12 21:34:47 button
Apr 12 21:34:47
Apr 12 21:34:47 [75707.119088] CPU: 15 PID: 31940 Comm: main Not tainted
4.4.1 #2
Apr 12 21:34:47 [75707.119134] Hardware name: Supermicro Super
Server/X10DRi-LN4+, BIOS 2.0 12/17/2015
Apr 12 21:34:47 [75707.119196] 0000000000000000
Apr 12 21:34:47 ffffffff812abdf3
Apr 12 21:34:47 0000000000000000
Apr 12 21:34:47 ffffffff810cf5f5
Apr 12 21:34:47
Apr 12 21:34:47 [75707.119277] ffff883ff2a20000
Apr 12 21:34:47 ffffffff810fcea2
Apr 12 21:34:47 0000000000000001
Apr 12 21:34:47 ffff88407fce5e58
Apr 12 21:34:47
Apr 12 21:34:47 [75707.119360] ffff88407fceaf00
Apr 12 21:34:47 ffff88407fceb100
Apr 12 21:34:47 ffff883ff2a20000
Apr 12 21:34:47 ffffffff8101bc63
Apr 12 21:34:47
Apr 12 21:34:47 [75707.119439] Call Trace:
Apr 12 21:34:47 [75707.119471] <NMI>
Apr 12 21:34:47 [<ffffffff812abdf3>] ? dump_stack+0x40/0x5d
Apr 12 21:34:47 [75707.119527] [<ffffffff810cf5f5>] ?
watchdog_overflow_callback+0xb5/0xd0
Apr 12 21:34:47 [75707.119571] [<ffffffff810fcea2>] ?
__perf_event_overflow+0x82/0x1c0
Apr 12 21:34:47 [75707.119614] [<ffffffff8101bc63>] ?
intel_pmu_handle_irq+0x1c3/0x3e0
Apr 12 21:34:47 [75707.119657] [<ffffffff8113b5cb>] ?
vunmap_page_range+0x1bb/0x320
Apr 12 21:34:47 [75707.119703] [<ffffffff813213e0>] ?
ghes_copy_tofrom_phys+0x110/0x1d0
Apr 12 21:34:47 [75707.119758] [<ffffffff81014f53>] ?
perf_event_nmi_handler+0x23/0x40
Apr 12 21:34:47 [75707.119800] [<ffffffff81007b85>] ?
nmi_handle+0x65/0x100
Apr 12 21:34:47 [75707.119838] [<ffffffff81007d2e>] ? do_nmi+0x10e/0x360
Apr 12 21:34:47 [75707.119878] [<ffffffff8148f957>] ?
end_repeat_nmi+0x1a/0x1e
Apr 12 21:34:47 [75707.119920] [<ffffffff810862ca>] ?
queued_spin_lock_slowpath+0xea/0x150
Apr 12 21:34:47 [75707.119962] [<ffffffff810862ca>] ?
queued_spin_lock_slowpath+0xea/0x150
Apr 12 21:34:47 [75707.120002] [<ffffffff810862ca>] ?
queued_spin_lock_slowpath+0xea/0x150
Apr 12 21:34:47 [75707.120042] <<EOE>>
Apr 12 21:34:47 [<ffffffffa01b413b>] ? make_request+0x60b/0xbd0 [raid456]
Apr 12 21:34:47 [75707.120113] [<ffffffff810815c0>] ? wait_woken+0x80/0x80
Apr 12 21:34:47 [75707.120152] [<ffffffffa017632d>] ?
md_make_request+0xdd/0x220 [md_mod]
Apr 12 21:34:47 [75707.120195] [<ffffffff8128691d>] ?
generic_make_request+0xed/0x1d0
Apr 12 21:34:47 [75707.120236] [<ffffffff81286a5a>] ?
submit_bio+0x5a/0x140
Apr 12 21:34:47 [75707.120277] [<ffffffff8112afaf>] ?
workingset_refault+0x4f/0xa0
Apr 12 21:34:47 [75707.120320] [<ffffffff811a215e>] ?
mpage_bio_submit+0x1e/0x30
Apr 12 21:34:47 [75707.120359] [<ffffffff811a3076>] ?
mpage_readpages+0x106/0x130
Apr 12 21:34:47 [75707.120401] [<ffffffff8121b510>] ?
__xfs_get_blocks+0x750/0x750
Apr 12 21:34:47 [75707.120439] [<ffffffff8121b510>] ?
__xfs_get_blocks+0x750/0x750
Apr 12 21:34:47 [75707.120481] [<ffffffff8114ad45>] ?
alloc_pages_current+0x85/0x110
Apr 12 21:34:47 [75707.120523] [<ffffffff81111d25>] ?
__do_page_cache_readahead+0x165/0x1f0
Apr 12 21:34:47 [75707.120564] [<ffffffff811344f5>] ? vma_link+0x75/0xb0
Apr 12 21:34:47 [75707.120602] [<ffffffff811120c7>] ?
force_page_cache_readahead+0x77/0xe0
Apr 12 21:34:47 [75707.120644] [<ffffffff8113f876>] ?
madvise_willneed+0x76/0x140
Apr 12 21:34:47 [75707.120683] [<ffffffff811301ce>] ?
handle_mm_fault+0x9ae/0x1650
Apr 12 21:34:47 [75707.120722] [<ffffffff81133dcb>] ? find_vma+0x5b/0x70
Apr 12 21:34:47 [75707.120760] [<ffffffff8113fc52>] ?
SyS_madvise+0x312/0x6f0
Apr 12 21:34:47 [75707.120799] [<ffffffff8148d9db>] ?
entry_SYSCALL_64_fastpath+0x16/0x6e
Once this starts, a couple of minutes goes by and the machine locks up
completely.
I have been unable to locate the problem here, anyone that can point me in
the right direction?
Best regards
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html