cluster suite crashing

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I am again attempting a 2-node cluster (two_node=1). This time we have power fencing, creating a cluster config from scratch.

Unplug network cables on Node A. Node B still plugged in. (Expected B to fence A.)
Node B does not attempt fencing, claims to have lost quorum (???). (
Plug Node A back in.
Node A fences Node B

On reboot, Node B reboots itself right after fencing Node A.
[repeat]

clurgmgrd[3630]: <crit> *Watchdog: Daemon died, rebooting

*Various things appear directly ahead of this in the log. Most of the time it was a service script that was failing a stop operation. Correcting it did not resolve the issue:

[/var/log/messages on Node B]
clurgmgrd[5669]: <notice> Resource Group Manager Starting
[selinux warnings]
clurgmgrd[5667]: <crit> Watchdog: Daemon died, rebooting...
kernel: md: stopping all md devices.
fenced[4617]: fence "[Node A]" success
[reboot]

[some pertinent lines from cluster.conf - they are identical on each node]
   <fence_daemon clean_start="0" post_fail_delay="0" post_join_delay="12"/>
   <cman expected_votes="1" two_node="1"/>

Meanwhile, Node A comes up and fences B when it gets a chance.

I'm really at a loss on what to do. We are running the RHEL 5 rpms from RHN. Googling the error message yields some results on crashes in RGManager which were allegedly fixed in version 4. I have seen some other squirrelly behavior out of RGManager at various points, but reboots seemed to fix those so I figured proper fencing might render them moot.

Any advice is welcome.

Thanks,
Chris

--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster

[Index of Archives]     [Corosync Cluster Engine]     [GFS]     [Linux Virtualization]     [Centos Virtualization]     [Centos]     [Linux RAID]     [Fedora Users]     [Fedora SELinux]     [Big List of Linux Books]     [Yosemite Camping]

  Powered by Linux