Hello Lon, all, |Linux-cluster doesn't generate traps/notifications at this point, so I'd |guess the HP agent :) |-- Lon Yep, we found the HP Health agent (cmahostd) that quit sending SNMP messages during the cluster hang: Dec 10 07:22:24 dm73sr02 kernel: INFO: task cmahostd:31542 blocked for more than 120 seconds. Dec 10 07:22:24 dm73sr02 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Dec 10 07:22:24 dm73sr02 kernel: cmahostd D ffffffff801508e3 0 31542 1 31576 31540 (NOTLB) Dec 10 07:22:24 dm73sr02 kernel: ffff810c3b889cf8 0000000000000086 0000000000000018 ffffffff884414f8 Dec 10 07:22:24 dm73sr02 kernel: 0000000000000292 000000000000000a ffff810c3f54a820 ffff810c4e1b6040 Dec 10 07:22:24 dm73sr02 kernel: 00007122b167f658 0000000000bb9ecb ffff810c3f54aa08 0000000888442e5f Call Trace: [<ffffffff884414f8>] :dlm:request_lock+0x93/0xa0 [<ffffffff8846cee3>] :gfs2:just_schedule+0x0/0xe [<ffffffff8846ceec>] :gfs2:just_schedule+0x9/0xe [<ffffffff80063a16>] __wait_on_bit+0x40/0x6e [<ffffffff8846cee3>] :gfs2:just_schedule+0x0/0xe [<ffffffff80063ab0>] out_of_line_wait_on_bit+0x6c/0x78 [<ffffffff800a0b44>] wake_bit_function+0x0/0x23 [<ffffffff8846cede>] :gfs2:gfs2_glock_wait+0x2b/0x30 [<ffffffff8847b2ba>] :gfs2:gfs2_getattr+0x85/0xc4 [<ffffffff8847b2b2>] :gfs2:gfs2_getattr+0x7d/0xc4 [<ffffffff8000e390>] vfs_getattr+0x2d/0xa9 [<ffffffff800288ec>] vfs_stat_fd+0x32/0x4a [<ffffffff8000e4db>] free_pages_and_swap_cache+0x67/0x7e [<ffffffff80083f43>] sys32_stat64+0x11/0x29 [<ffffffff8006153d>] sysenter_tracesys+0x48/0x83 [<ffffffff8006149d>] sysenter_do_call+0x1e/0x76 Regards, James Hofmeister Hewlett Packard Linux Solutions Engineer -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster