Fencing of node

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



After creating simple two node cluster, one node is being fenced continually. I'm running pacemaker (1.1.10-29) with two nodes and the following corosync.conf:

totem {
version: 2
secauth: off
cluster_name: rh7cluster
transport: udpu
}

nodelist {
  node {
        ring0_addr: rh7cn1.devlab.sinenomine.net
        nodeid: 1
       }
  node {
        ring0_addr: rh7cn2.devlab.sinenomine.net
        nodeid: 2
       }
}

quorum {
provider: corosync_votequorum
two_node: 1
}

logging {
to_syslog: yes
}

Starting the cluster shows:

Oct  2 15:17:47 rh7cn1 kernel: dlm: connect from non cluster node

In the logs of both nodes. Both nodes then try and bring up resources (dlm, clvmd, and a cluster fs). 

Just prior to a node being fence, both nodes show the following

# pcs resource show
 Clone Set: dlm-clone [dlm]
     Started: [ rh7cn1.devlab.sinenomine.net rh7cn2.devlab.sinenomine.net ]
 Clone Set: clvmd-clone [clvmd]
     clvmd	(ocf::heartbeat:clvm):	FAILED 
     Started: [ rh7cn2.devlab.sinenomine.net ]
 Clone Set: clusterfs-clone [clusterfs]
     Started: [ rh7cn2.devlab.sinenomine.net ]
     Stopped: [ rh7cn1.devlab.sinenomine.net ]

Shortly after there is a clvmd timeout message in one of the logs and then that node gets fenced. I had added the high-availability firewalld service to both nodes.

Running crm_simulate -SL -VV shows:

 warning: unpack_rsc_op: 	Processing failed op start for clvmd:1 on rh7cn1.devlab.sinenomine.net: unknown error (1)

Current cluster status:
Online: [ rh7cn1.devlab.sinenomine.net rh7cn2.devlab.sinenomine.net ]

 ZVMPOWER	(stonith:fence_zvm):	Started rh7cn2.devlab.sinenomine.net 
 Clone Set: dlm-clone [dlm]
     Started: [ rh7cn1.devlab.sinenomine.net rh7cn2.devlab.sinenomine.net ]
 Clone Set: clvmd-clone [clvmd]
     clvmd	(ocf::heartbeat:clvm):	FAILED rh7cn1.devlab.sinenomine.net 
     Started: [ rh7cn2.devlab.sinenomine.net ]
 Clone Set: clusterfs-clone [clusterfs]
     Started: [ rh7cn2.devlab.sinenomine.net ]
     Stopped: [ rh7cn1.devlab.sinenomine.net ]

 warning: common_apply_stickiness: 	Forcing clvmd-clone away from rh7cn1.devlab.sinenomine.net after 1000000 failures (max=1000000)
 warning: common_apply_stickiness: 	Forcing clvmd-clone away from rh7cn1.devlab.sinenomine.net after 1000000 failures (max=1000000)
Transition Summary:
 * Stop    clvmd:1	(rh7cn1.devlab.sinenomine.net)

Executing cluster transition:
 * Pseudo action:   clvmd-clone_stop_0
 * Resource action: clvmd           stop on rh7cn1.devlab.sinenomine.net
 * Pseudo action:   clvmd-clone_stopped_0
 * Pseudo action:   all_stopped

Revised cluster status:
 warning: unpack_rsc_op: 	Processing failed op start for clvmd:1 on rh7cn1.devlab.sinenomine.net: unknown error (1)
Online: [ rh7cn1.devlab.sinenomine.net rh7cn2.devlab.sinenomine.net ]

 ZVMPOWER	(stonith:fence_zvm):	Started rh7cn2.devlab.sinenomine.net 
 Clone Set: dlm-clone [dlm]
     Started: [ rh7cn1.devlab.sinenomine.net rh7cn2.devlab.sinenomine.net ]
 Clone Set: clvmd-clone [clvmd]
     Started: [ rh7cn2.devlab.sinenomine.net ]
     Stopped: [ rh7cn1.devlab.sinenomine.net ]
 Clone Set: clusterfs-clone [clusterfs]
     Started: [ rh7cn2.devlab.sinenomine.net ]
     Stopped: [ rh7cn1.devlab.sinenomine.net ]

With RHEL 6 I would use a qdisk but this has been replaced by corosync_votequorum.

This is my first RHEL 7 HA cluster so I'm at the beginning of my learning. Any pointers as to what I should look at or what I need to read?

Neale

-- 
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster




[Index of Archives]     [Corosync Cluster Engine]     [GFS]     [Linux Virtualization]     [Centos Virtualization]     [Centos]     [Linux RAID]     [Fedora Users]     [Fedora SELinux]     [Big List of Linux Books]     [Yosemite Camping]

  Powered by Linux