Re: Physical shutdown of one node causes both node to crash in active/passive configuration of 2 node RHEL cluster

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Digimer,

I have applied the changes but looks like it goes into fence loop. That means when node 1 is running cman and when  reboot node2, it fences node1 and they get into a loop

1) On both nodes acpid is off

 krplporcl001 ~]# service acpid status
  acpid is stopped

 krplporcl002 ~]# service acpid status
acpid is stopped

2)  Changes in cluster .conf <

<clusternode name= "krplporcl001"  nodeid="1" >
           <fence>
               <method name  = "1">
                 <device lanplus = "" name="inspuripmi" delay ="15"  action ="">
                 </method>
            </fence>
           </clusternode>
            <clusternode name = "krplporcl002" nodeid="2">
                 <fence>


3) Bonding uses mode = 1 only

on krplporcl001 : 

DEVICE=bond0
IPADDR=192.168.10.10
NETMASK=255.255.255.0
NETWORK=192.168.10.0
BROADCAST=192.168.10.255
BOOTPROTO=none
Type=Ethernet
_ONBOOT_=yes
BONDING_OPTS='miimon=100 mode=1'

on krplporcl002

DEVICE=bond0
IPADDR=192.168.10.11
NETMASK=255.255.255.0
NETWORK=192.168.10.0
BROADCAST=192.168.10.255

BOOTPROTO=none
Type=Ethernet
_ONBOOT_=yes
BONDING_OPTS='miimon=100 mode=1'
~
4) I have put one switch as sivaji suggested

As soon as 
The logs on klrplporcl001 are as follows
Sep 10 11:47:53 krplporcl001 fenced[5977]: fencing node krplporcl002

The logs on krplporcl002 are as follows :

Sep 10 11:46:48 krplporcl002 fenced[2950]: fencing node krplporcl001

I am not sure why the network is breaking and why both nodes can not communicate with each other?

Any places to look for logs etc? 



On Wed, Sep 10, 2014 at 11:28 AM, Amjad Syed <amjadcsu@xxxxxxxxx> wrote:


On Tue, Sep 9, 2014 at 11:53 AM, Digimer <lists@xxxxxxxxxx> wrote:
On 09/09/14 03:14 AM, Amjad Syed wrote:
<device lanplus = "" name="inspuripmi"  action ="">

Something is breaking the network during the shutdown, a fence is being called and both nodes are killing the other, causing a dual fence. So you have a set of problems, I think.

First, disable acpid on both nodes.

Second, change the quoted line (only) to:

<device lanplus = "" name="inspuripmi" delay="15" action ="">
If I am right, this will mean that 192.168.10.10 will stay up (fence) .11

Third, what bonding mode are you using? I would only use mode=1.

Forth, please set the node names to match 'uname -n' on both nodes. Be sure the names translate to the IPs you want (via /etc/hosts, ideally).

Fifth, as Sivaji suggested, please put switch(es) between the nodes.

If it still tries to fence when a node shuts down (watch /var/log/messages and look for 'fencing node ...'), please paste your logs from both nodes.

--
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without access to education?


-- 
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster

[Index of Archives]     [Corosync Cluster Engine]     [GFS]     [Linux Virtualization]     [Centos Virtualization]     [Centos]     [Linux RAID]     [Fedora Users]     [Fedora SELinux]     [Big List of Linux Books]     [Yosemite Camping]

  Powered by Linux