On Wed, Feb 24, 2010 at 08:54:38AM -0600, Doug Tucker wrote: > Thanks to you and Carlos. I understand a bit better now what you are > referring to, however, I don't believe that is the issue. The reason we > went to the crossover cable was to avoid this issue, as we had a switch > die once, and both then thought they were master and tried to fence the > other. In my situation, there is no reason for the missed heartbeat > that I can find. The interfaces have not gone down. We ran a test > where I started a ping between the 2 that wrote out to a file until a > "heartbeat" missed and a reboot occurred. There was not a single missed > ping between the 2 nodes prior to the event. Also in a split brain, > both machines should recognize the other one "gone" and try to become > master. In this case, only 1 of the nodes at a time is seeing a "missed > heartbeat" and then attempting to fence the other. We have replaced all > hardware to include cables even to ensure it wasn't that. This appears > to be some software bug of sorts. Again, we have another 2 node cluster > that this doesn't occur on, but, they are running a different kernel and > gfs module. > Doug, did you search if there are any bugs in NIC's module that you are using ? Maybe try to look at kernel's changelog to see if there are any changes on these modules... cya -- --- Best Regards Carlos Eduardo Maiolino Support engineer Red Hat - Global Support Services -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster