Re: all nodes halt when one lose connection

ESGLinux <esggrupos@xxxxxxxxx> · Mon, 8 Jun 2009 09:55:41 +0200

Thanks for your answers, 
I have used a  separated network for the manage and service networks with 2 switchs and now it works fine. 

Thanks again, 

ESG

2009/5/28 Kaerka Phillips <kbphillips80@xxxxxxxxx>

One thing we did not try, but might've worked, would be to bond two network interfaces together and then use vlan tagging on top of the bond interface to create a vlan across it to the other node, and then pointing the cluster to the vlan interfaces, which should still be up if even if the loss of one network interface or one switch.

On Wed, May 27, 2009 at 7:48 PM, Kaerka Phillips <kbphillips80@xxxxxxxxx> wrote:

It sounds like they're fencing themselves.  We got around this issue on a two-node cluster by including the alternate node's internal ip address in the /etc/hosts file of both hosts and a cross-over cable for the service network with the private ip addresses assigned to that network.  If you're trying to get them to monitor each other via the public network, in theory this could be done with a backup fencing method, but we weren't able to get this work since the heartbeat functions only happen on the network that the node names are defined to use.

On Mon, May 25, 2009 at 5:28 AM, ESGLinux <esggrupos@xxxxxxxxx> wrote:

Hi, 
I think this is not my problem because fencing works fine. The nodes gets fenced inmediatly but I think they fence when they don't must 

Greetings, 

ESG

2009/5/22 jorge sanchez <xsanch@xxxxxxxxx>

Hi, 

try also disable the acpi if is it running , see following:

http://www.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/5/html/Cluster_Administration/s1-acpi-CA.html

Regards,

Jorge Sanchez

On Thu, May 21, 2009 at 5:34 PM, ESGLinux <esggrupos@xxxxxxxxx> wrote:

2009/5/21 Jonathan Brassow <jbrassow@xxxxxxxxxx>

On May 21, 2009, at 9:57 AM, ESGLinux wrote:

Hello,

these are the logs I get:

In node1:

May 21 11:33:44 NODE1 fenced[3840]: NODE2 not a cluster member after 5 sec post_fail_delay

May 21 11:33:44 NODE1 fenced[3840]: fencing node "NODE2"

May 21 11:33:44 NODE1 shutdown[5448]: shutting down for system halt

in node2:

May 21 11:33:45 NODE2 fenced[3843]: NODE1 not a cluster member after 5 sec post_fail_delay

May 21 11:33:45 NODE2 fenced[3843]: fencing node "NODE1"

May 21 11:33:45 NODE2 shutdown[5923]: shutting down for system halt

what I don´t know is way they lose the connection with the cluster, they are still connected (I only unplug a cable from the service network)

That may be something worth chasing down, as it appears that your cluster communication is on a network you don't expect?

How can I be sure about the network the nodes are using for communication? I think they do for the network I have configured to do that....

Also, are the nodes simply "shutting down", or are they being forcibly rebooted.  If it is a casual shutdown, then it would appear that both nodes are trying to shutdown simultaneously.

they simply shutdown. They no reboot. 

This is what I get every time I unplug the nework cable from eth0 of any of the two nodes. (they communicate through eth1...)

Greetings, 

ESG

 brassow

--

Linux-cluster mailing list

Linux-cluster@xxxxxxxxxx

https://www.redhat.com/mailman/listinfo/linux-cluster

--

Linux-cluster mailing list

Linux-cluster@xxxxxxxxxx

https://www.redhat.com/mailman/listinfo/linux-cluster

--

Linux-cluster mailing list

Linux-cluster@xxxxxxxxxx

https://www.redhat.com/mailman/listinfo/linux-cluster

--

Linux-cluster mailing list

Linux-cluster@xxxxxxxxxx

https://www.redhat.com/mailman/listinfo/linux-cluster

--

Linux-cluster mailing list

Linux-cluster@xxxxxxxxxx

https://www.redhat.com/mailman/listinfo/linux-cluster

--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster