rgmanager is jamed

Nicolas Ross <rossnick-lists@xxxxxxxxxxx> · Fri, 25 May 2012 12:20:43 -0400

I am in the process of upgrading one of our cluster from RHEL 6.1 to 
6.2. It's an 8-node cluster.

I started with one node. Stop all cluster resources, cman, rgmanager et 
al. yum update, reboot, move to next. The first one did ok.

On the second one, rgmanager started, but doesn't seem to connect to 
other nodes. I found this in dmesg :

INFO: task rgmanager:2901 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
rgmanager     D 0000000000000000     0  2901   2900 0x00000080
 ffff880667299d48 0000000000000082 0000000000000000 ffff8806656aa318
 ffff88066729c378 0000000000000001 ffff880665bb31b0 00007fffc6c6fa20
 ffff88066635a678 ffff880667299fd8 000000000000f4e8 ffff88066635a678
Call Trace:
 [<ffffffff814ee6fe>] __mutex_lock_slowpath+0x13e/0x180
 [<ffffffff814ee59b>] mutex_lock+0x2b/0x50
 [<ffffffffa02c192c>] dlm_new_lockspace+0x3c/0xa30 [dlm]
 [<ffffffff8115f74c>] ? __kmalloc+0x20c/0x220
 [<ffffffffa02ca94d>] device_write+0x30d/0x7d0 [dlm]
 [<ffffffff8105ea30>] ? default_wake_function+0x0/0x20
 [<ffffffff8120c646>] ? security_file_permission+0x16/0x20
 [<ffffffff81176918>] vfs_write+0xb8/0x1a0
 [<ffffffff810d4932>] ? audit_syscall_entry+0x272/0x2a0
 [<ffffffff81177321>] sys_write+0x51/0x90
 [<ffffffff8100b0f2>] system_call_fastpath+0x16/0x1b

Tried rebooting, but the shutdown staled on stoping rgmanager. Fenced 
the node, same outcome.

Any hints ?

--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster