Thanks to you and Carlos. I understand a bit better now what you are referring to, however, I don't believe that is the issue. The reason we went to the crossover cable was to avoid this issue, as we had a switch die once, and both then thought they were master and tried to fence the other. In my situation, there is no reason for the missed heartbeat that I can find. The interfaces have not gone down. We ran a test where I started a ping between the 2 that wrote out to a file until a "heartbeat" missed and a reboot occurred. There was not a single missed ping between the 2 nodes prior to the event. Also in a split brain, both machines should recognize the other one "gone" and try to become master. In this case, only 1 of the nodes at a time is seeing a "missed heartbeat" and then attempting to fence the other. We have replaced all hardware to include cables even to ensure it wasn't that. This appears to be some software bug of sorts. Again, we have another 2 node cluster that this doesn't occur on, but, they are running a different kernel and gfs module. On Wed, 2010-02-24 at 03:34 -0600, ESGLinux wrote: > Hi Doug, > > > the split brain is what is happening to you ;-) > > > From wikipedia: http://en.wikipedia.org/wiki/High-availability_cluster > "HA clusters usually use a heartbeat private network connection which > is used to monitor the health and status of each node in the cluster. > One subtle, but serious condition every clustering software must be > able to handle is split-brain. Split-brain occurs when all of the > private links go down simultaneously, but the cluster nodes are still > running. If that happens, each node in the cluster may mistakenly > decide that every other node has gone down and attempt to start > services that other nodes are still running. Having duplicate > instances of services may cause data corruption on the shared > storage." > > > The qourum disk is a good choice to avoid it as they have told you. > > > Good luck, > > > Greetings, > > > ESG > > > > > 2010/2/23 Doug Tucker <tuckerd@xxxxxxxxxxxx> > > Hi Doug, maybe you can avoid this kind of problem using a > quorumdisk partition. a two node cluster is split-brain prone > and with a quorumdisk partition you can avoid split-brain > situations, which probably is causing this behavior. > > > > So, about use a cross-over (or straight) cable, I don't know > any issue about it, but, try to check if it's using > full-duplex mode. half-duplex mode on cross-over linked > machines probably will cause heartbeat problems. > > > > cya.. > > > Can you give me a little info about "split-brain" issues? I > don't > understand what you mean by that, and what I'm solving with a > quorumdisk. And it has always worked fine, it just started > happening > after the install of this newer kernel/gfs module. The other > 2 node > cluster is still rock solid. Also, the network interfaces on > the > crossover are full duplex. Thanks for writing back, you're > the first > person who offered anything. > > > > > -- > Linux-cluster mailing list > Linux-cluster@xxxxxxxxxx > https://www.redhat.com/mailman/listinfo/linux-cluster > > > -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster