Re: CLVM/GFS will not mount or communicate with cluster

Barry Brimer <lists@xxxxxxxxxx> · Sat, 9 Dec 2006 11:01:38 -0600 (CST)

On Sat, 9 Dec 2006, Robert Peterson wrote:

Barry Brimer wrote:
Everything was working fine for several months on this cluster.  The 
cluster software is the latest provided by Red Hat for RHEL4.  Latest 
kernel.  I am using fence_ilo, and the working node fenced the problem 
node.

Same versions - RHEL 4 Red Hat Latest:

I've since discovered that another GFS cluster (non-production) had a 
similar issue, and a reboot on both nodes solved this problem.  With the 
original (production) cluster, I am trying to figure out how to get the 
problem node back into the cluster without having to unmount the GFS volume 
from the remaining working node.

Thank you so much for your input, it is greatly appreciated.

If you have any more suggestions, particularly on how to get my problem 
node back into the cluster without unmounting the GFS volume from the 
working node, please let me know.

Thanks,
Barry
Hi Barry,

Hm.  Can it be that your other nodes are still running the old cman in 
memory?
This might happen if you update the kernel code and cman code with up2date
to the latest, but haven't rebooted or loaded the new cluster kernel modules 
yet on the
remaining nodes.  That would also explain why a reboot solved the problem in 
the other
cluster you wrote about.  Perhaps you should do "uname -a" on all nodes and 
make sure
they're all running the same kernel.  If working node(s) and the rebooted 
node are
both running the same kernel, then they will also be running the same cluster 
modules,
i.e., cman, in which case your problem might  be a new bug.

Even if they're not running the same kernel, the cman modules have a 
compatible
protocol, unless the U1 version of cman is still running on the working 
node(s).
If the working node is found to be running the old U1 cman, even if the new 
RPMs are
installed, you may want to reboot in order to pick up the new kernel and 
cluster modules.

Bob,

Thank you for your responses.  They are greatly appreciated.  These 
systems never ran 4U1 as they were installed as current in June.  I ended 
up having to get a maintenance downtime, as these are production systems. 
As expected, once both systems were rebooted, the cluster regained quorum 
with no problem and all services were established without issue.  I was 
hoping that there might have been a way to regain cluster services without 
having to take the service down that runs on the GFS volume, but I 
couldn't find a way to do so.  Thanks again for your help.

Barry

--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster