Barry Brimer wrote:
Everything was working fine for several months on this cluster. The
cluster software is the latest provided by Red Hat for RHEL4. Latest
kernel. I am using fence_ilo, and the working node fenced the problem
node.
Same versions - RHEL 4 Red Hat Latest:
I've since discovered that another GFS cluster (non-production) had a
similar issue, and a reboot on both nodes solved this problem. With
the original (production) cluster, I am trying to figure out how to
get the problem node back into the cluster without having to unmount
the GFS volume from the remaining working node.
Thank you so much for your input, it is greatly appreciated.
If you have any more suggestions, particularly on how to get my
problem node back into the cluster without unmounting the GFS volume
from the working node, please let me know.
Thanks,
Barry
Hi Barry,
Hm. Can it be that your other nodes are still running the old cman in
memory?
This might happen if you update the kernel code and cman code with up2date
to the latest, but haven't rebooted or loaded the new cluster kernel
modules yet on the
remaining nodes. That would also explain why a reboot solved the
problem in the other
cluster you wrote about. Perhaps you should do "uname -a" on all nodes
and make sure
they're all running the same kernel. If working node(s) and the
rebooted node are
both running the same kernel, then they will also be running the same
cluster modules,
i.e., cman, in which case your problem might be a new bug.
Even if they're not running the same kernel, the cman modules have a
compatible
protocol, unless the U1 version of cman is still running on the working
node(s).
If the working node is found to be running the old U1 cman, even if the
new RPMs are
installed, you may want to reboot in order to pick up the new kernel and
cluster modules.
Regards,
Bob Peterson
Red Hat Cluster Suite
--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster