On 05/25/2012 06:20 PM, Nicolas Ross wrote: > I am in the process of upgrading one of our cluster from RHEL 6.1 to > 6.2. It's an 8-node cluster. > > I started with one node. Stop all cluster resources, cman, rgmanager et > al. yum update, reboot, move to next. The first one did ok. > > On the second one, rgmanager started, but doesn't seem to connect to > other nodes. I found this in dmesg : > > INFO: task rgmanager:2901 blocked for more than 120 seconds. > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > rgmanager D 0000000000000000 0 2901 2900 0x00000080 > ffff880667299d48 0000000000000082 0000000000000000 ffff8806656aa318 > ffff88066729c378 0000000000000001 ffff880665bb31b0 00007fffc6c6fa20 > ffff88066635a678 ffff880667299fd8 000000000000f4e8 ffff88066635a678 > Call Trace: > [<ffffffff814ee6fe>] __mutex_lock_slowpath+0x13e/0x180 > [<ffffffff814ee59b>] mutex_lock+0x2b/0x50 > [<ffffffffa02c192c>] dlm_new_lockspace+0x3c/0xa30 [dlm] > [<ffffffff8115f74c>] ? __kmalloc+0x20c/0x220 > [<ffffffffa02ca94d>] device_write+0x30d/0x7d0 [dlm] > [<ffffffff8105ea30>] ? default_wake_function+0x0/0x20 > [<ffffffff8120c646>] ? security_file_permission+0x16/0x20 > [<ffffffff81176918>] vfs_write+0xb8/0x1a0 > [<ffffffff810d4932>] ? audit_syscall_entry+0x272/0x2a0 > [<ffffffff81177321>] sys_write+0x51/0x90 > [<ffffffff8100b0f2>] system_call_fastpath+0x16/0x1b > > Tried rebooting, but the shutdown staled on stoping rgmanager. Fenced > the node, same outcome. > > Any hints ? This looks like a kernel dlm problem. I can see you found a workaround, but that should not be necessary since upgrades between releases should work. can you please file a ticket with GSS and escalate it? Might be a good idea to grab sosreports before those logs are flushed away in rotate. Thanks Fabio -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster