Digimer,
I have applied the changes but looks like it goes into fence loop. That means when node 1 is running cman and when reboot node2, it fences node1 and they get into a loop
1) On both nodes acpid is off
krplporcl001 ~]# service acpid status
acpid is stopped
krplporcl002 ~]# service acpid status
acpid is stopped
2) Changes in cluster .conf <
<clusternode name= "krplporcl001" nodeid="1" >
<fence>
<method name = "1">
<device lanplus = "" name="inspuripmi" delay ="15" action ="">
</method>
</fence>
</clusternode>
<clusternode name = "krplporcl002" nodeid="2">
<fence>
3) Bonding uses mode = 1 only
on krplporcl001 :
DEVICE=bond0
IPADDR=192.168.10.10
NETMASK=255.255.255.0
NETWORK=192.168.10.0
BROADCAST=192.168.10.255
BOOTPROTO=none
Type=Ethernet
_ONBOOT_=yes
BONDING_OPTS='miimon=100 mode=1'
on krplporcl002
DEVICE=bond0
IPADDR=192.168.10.11
NETMASK=255.255.255.0
NETWORK=192.168.10.0
BROADCAST=192.168.10.255
BOOTPROTO=none
Type=Ethernet
_ONBOOT_=yes
BONDING_OPTS='miimon=100 mode=1'
~
4) I have put one switch as sivaji suggested
As soon as
The logs on klrplporcl001 are as follows
Sep 10 11:47:53 krplporcl001 fenced[5977]: fencing node krplporcl002
The logs on krplporcl002 are as follows :
Sep 10 11:46:48 krplporcl002 fenced[2950]: fencing node krplporcl001
I am not sure why the network is breaking and why both nodes can not communicate with each other?
Any places to look for logs etc?
On Wed, Sep 10, 2014 at 11:28 AM, Amjad Syed <amjadcsu@xxxxxxxxx> wrote:
On Tue, Sep 9, 2014 at 11:53 AM, Digimer <lists@xxxxxxxxxx> wrote:On 09/09/14 03:14 AM, Amjad Syed wrote:
<device lanplus = "" name="inspuripmi" action ="">
Something is breaking the network during the shutdown, a fence is being called and both nodes are killing the other, causing a dual fence. So you have a set of problems, I think.
First, disable acpid on both nodes.
Second, change the quoted line (only) to:
<device lanplus = "" name="inspuripmi" delay="15" action ="">
If I am right, this will mean that 192.168.10.10 will stay up (fence) .11
Third, what bonding mode are you using? I would only use mode=1.
Forth, please set the node names to match 'uname -n' on both nodes. Be sure the names translate to the IPs you want (via /etc/hosts, ideally).
Fifth, as Sivaji suggested, please put switch(es) between the nodes.
If it still tries to fence when a node shuts down (watch /var/log/messages and look for 'fencing node ...'), please paste your logs from both nodes.
--
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without access to education?
--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster
-- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster
- References:
- Physical shutdown of one node causes both node to crash in active/passive configuration of 2 node RHEL cluster
- From: Amjad Syed
- Re: Physical shutdown of one node causes both node to crash in active/passive configuration of 2 node RHEL cluster
- From: Digimer
- Re: Physical shutdown of one node causes both node to crash in active/passive configuration of 2 node RHEL cluster
- From: Amjad Syed
- Physical shutdown of one node causes both node to crash in active/passive configuration of 2 node RHEL cluster
- Prev by Date: Re: Physical shutdown of one node causes both node to crash in active/passive configuration of 2 node RHEL cluster
- Next by Date: Cman (and corosync) starting before network interface is ready
- Previous by thread: Re: Physical shutdown of one node causes both node to crash in active/passive configuration of 2 node RHEL cluster
- Next by thread: Cman (and corosync) starting before network interface is ready
- Index(es):