I am in the process of upgrading one of our cluster from RHEL 6.1 to
6.2. It's an 8-node cluster.
I started with one node. Stop all cluster resources, cman, rgmanager et
al. yum update, reboot, move to next. The first one did ok.
On the second one, rgmanager started, but doesn't seem to connect to
other nodes. I found this in dmesg :
INFO: task rgmanager:2901 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
rgmanager D 0000000000000000 0 2901 2900 0x00000080
ffff880667299d48 0000000000000082 0000000000000000 ffff8806656aa318
ffff88066729c378 0000000000000001 ffff880665bb31b0 00007fffc6c6fa20
ffff88066635a678 ffff880667299fd8 000000000000f4e8 ffff88066635a678
Call Trace:
[<ffffffff814ee6fe>] __mutex_lock_slowpath+0x13e/0x180
[<ffffffff814ee59b>] mutex_lock+0x2b/0x50
[<ffffffffa02c192c>] dlm_new_lockspace+0x3c/0xa30 [dlm]
[<ffffffff8115f74c>] ? __kmalloc+0x20c/0x220
[<ffffffffa02ca94d>] device_write+0x30d/0x7d0 [dlm]
[<ffffffff8105ea30>] ? default_wake_function+0x0/0x20
[<ffffffff8120c646>] ? security_file_permission+0x16/0x20
[<ffffffff81176918>] vfs_write+0xb8/0x1a0
[<ffffffff810d4932>] ? audit_syscall_entry+0x272/0x2a0
[<ffffffff81177321>] sys_write+0x51/0x90
[<ffffffff8100b0f2>] system_call_fastpath+0x16/0x1b
Tried rebooting, but the shutdown staled on stoping rgmanager. Fenced
the node, same outcome.
Any hints ?
--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster