Here is my cluster.conf: <?xml version="1.0"?> <cluster config_version="33" name="GFSpfsCluster"> <logging debug="on"/> <clusternodes> <clusternode name="pfs03.ns.gfs2.us" nodeid="1" votes="1"> <fence> <method name="single"> <device name="pfs03.ns.us.ctidata.net_vmware"/> </method> </fence> </clusternode> <clusternode name="pfs04.ns.gfs2.us" nodeid="2" votes="1"> <fence> <method name="single"> <device name="pfs04.ns.us.ctidata.net_vmware"/> </method> </fence> </clusternode> <clusternode name="pfs05.ns.gfs2.us" nodeid="3" votes="1"> <fence> <method name="single"> <device name="pfs05.ns.us.ctidata.net_vmware"/> </method> </fence> </clusternode> </clusternodes> <fencedevices> <fencedevice agent="fence_vmware" ipaddr="10.50.6.20" login="administrator" name="pfs03.ns.us.ctidata.net_vmware" passwd="secret" port="pfs03.ns.us.ctidata.net"/> <fencedevice agent="fence_vmware" ipaddr="10.50.6.20" login="administrator" name="pfs04.ns.us.ctidata.net_vmware" passwd="secret" port="pfs04.ns.us.ctidata.net"/> <fencedevice agent="fence_vmware" ipaddr="10.50.6.20" login="administrator" name="pfs05.ns.us.ctidata.net_vmware" passwd="secret" port="pfs05.ns.us.ctidata.net"/> </fencedevices> <rm> <resources> <script file="/etc/init.d/httpd" name="httpd"/> </resources> <failoverdomains> <failoverdomain name="pfs03_only" nofailback="0" ordered="0" restricted="1"> <failoverdomainnode name="pfs03.ns.gfs2.us" priority="1"/> </failoverdomain> <failoverdomain name="pfs04_only" nofailback="0" ordered="0" restricted="1"> <failoverdomainnode name="pfs04.ns.gfs2.us" priority="1"/> </failoverdomain> <failoverdomain name="pfs05_only" nofailback="0" ordered="0" restricted="1"> <failoverdomainnode name="pfs05.ns.gfs2.us" priority="1"/> </failoverdomain> </failoverdomains> <service autostart="1" domain="pfs03_only" exclusive="0" name="pfs03_apache" recovery="restart"> <script ref="httpd"/> </service> <service autostart="1" domain="pfs04_only" exclusive="0" name="pfs04_apache" recovery="restart"> <script ref="httpd"/> </service> <service autostart="1" domain="pfs05_only" exclusive="0" name="pfs05_apache" recovery="restart"> </service> </rm> <fence_daemon clean_start="0" post_fail_delay="0" post_join_delay="3"/> <cman/> </cluster> uname -n = pfs05.ns.us.ctidata.net As I am sure you will notice the cluster.conf has the node set to pfs05.ns.gfs2.us while the hostname is set to pfs05.ns.us.ctidata.net. This was working prior, is working on the other 2 nodes and is configured this way so that the cluster uses a private vlan specifically setup for cluster communications. The network is setup as follows: eth0 = 10.50.10.32/24 this is the production traffic interface eth1 = 10.50.20.32/24 this is the interface used for iSCSI connections to our SAN eth2 = 10.50.6.32/24 this is the interface setup for FreeIPA authenticated ssh access in from our mgmt vlan. eth3 = 10.50.1.32/24 this is a legacy interface used during the transition from the old env to this new env eth4 = 10.50.3.70/27 this is the interface pfs05.ns.gfs2.us resolves to used for cluster communications. David On 08/01/2011 08:56 PM, Digimer wrote: On 08/01/2011 09:50 PM, David wrote:I have the RHCS installed on CentOS6 x86_64. One of the nodes in a 3 node cluster won't start after I moved the nodes to a new vlan. When I start cman this is what I get: Starting cluster: Checking Network Manager... [ OK ] Global setup... [ OK ] Loading kernel modules... [ OK ] Mounting configfs... [ OK ] Starting cman... Aug 02 01:45:17 corosync [MAIN ] Corosync Cluster Engine ('1.2.3'): started and ready to provide service. Aug 02 01:45:17 corosync [MAIN ] Corosync built-in features: nss rdma Aug 02 01:45:17 corosync [MAIN ] Successfully read config from /etc/cluster/cluster.conf Aug 02 01:45:17 corosync [MAIN ] Successfully parsed cman config Aug 02 01:45:17 corosync [TOTEM ] Token Timeout (10000 ms) retransmit timeout (2380 ms) Aug 02 01:45:17 corosync [TOTEM ] token hold (1894 ms) retransmits before loss (4 retrans) Aug 02 01:45:17 corosync [TOTEM ] join (60 ms) send_join (0 ms) consensus (12000 ms) merge (200 ms) Aug 02 01:45:17 corosync [TOTEM ] downcheck (1000 ms) fail to recv const (2500 msgs) Aug 02 01:45:17 corosync [TOTEM ] seqno unchanged const (30 rotations) Maximum network MTU 1402 Aug 02 01:45:17 corosync [TOTEM ] window size per rotation (50 messages) maximum messages per rotation (17 messages) Aug 02 01:45:17 corosync [TOTEM ] missed count const (5 messages) Aug 02 01:45:17 corosync [TOTEM ] send threads (0 threads) Aug 02 01:45:17 corosync [TOTEM ] RRP token expired timeout (2380 ms) Aug 02 01:45:17 corosync [TOTEM ] RRP token problem counter (2000 ms) Aug 02 01:45:17 corosync [TOTEM ] RRP threshold (10 problem count) Aug 02 01:45:17 corosync [TOTEM ] RRP mode set to none. Aug 02 01:45:17 corosync [TOTEM ] heartbeat_failures_allowed (0) Aug 02 01:45:17 corosync [TOTEM ] max_network_delay (50 ms) Aug 02 01:45:17 corosync [TOTEM ] HeartBeat is Disabled. To enable set heartbeat_failures_allowed > 0 Aug 02 01:45:17 corosync [TOTEM ] Initializing transport (UDP/IP). Aug 02 01:45:17 corosync [TOTEM ] Initializing transmit/receive security: libtomcrypt SOBER128/SHA1HMAC (mode 0). Aug 02 01:45:17 corosync [IPC ] you are using ipc api v2 Aug 02 01:45:18 corosync [TOTEM ] Receive multicast socket recv buffer size (262142 bytes). Aug 02 01:45:18 corosync [TOTEM ] Transmit multicast socket send buffer size (262142 bytes). corosync: totemsrp.c:3091: memb_ring_id_create_or_load: Assertion `res == sizeof (unsigned long long)' failed. Aug 02 01:45:18 corosync [TOTEM ] The network interface [10.50.3.70] is now up. corosync died with signal: 6 Check cluster logs for details Any idea what the issue could be? Thanks DavidWhat is your cluster.conf file (please obscure passwords only), what does `uname -n` return and what is your network configuration (interface names and IPs)? |
-- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster