Hello, I'm trying to migrate an older Centos 5 / rhcs2 cluster to the newer rhcs3. Being eager to play around, I decided to make my tests on Fedora 14, before Centos 6 is out. Although everything seemed to work fine at the beginning, after a few hours of cluster uptime I came across a strange situation of rgmanager being apparently blocked. The process is still there, but: 1. It no longer produces any output - it's run in a "screen" session, with params "-fd". Normally it's very verbose (I can see a lot of debug messages, including output from agent scripts). It's been more than a week since it blocked, and it hadn't output a sigle line of debug. 2. Resources from node 1 were (automatically) relocated to node 2 when node 1 blocked, but node 2 blocked in a similar manner a few hours later. 3. Now resources are still active on node 2, on both nodes a "clustat" looks like this: Service states unavailable: Temporary failure; try again Cluster Status for ****** @ Mon Nov 15 14:14:22 2010 Member Status: Quorate Member Name ID Status ------ ---- ---- ------ storage1.****** 1 Online, Local storage2.****** 2 Online I've already tried several simple things like: * looking at the process tree for some hung resource agents - no luck; it's just clurgmgrd and its child threads; * looking at the open files of clurgmgrd in /proc/NNN/fd - nothing unusual * tracing (with strace) the main clurgmgrd thread and the children. At this point I'm totally clueless, so any suggestion would be welcome. I can provide further info / logs about the running system / processes. Thanks, Radu Rendec -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster