Sounds a little split-brainish....... have you tried the clean_start=1 option?
On Jul 7, 2009 11:54 PM, "Abed-nego G. Escobal, Jr." <abednegoyulo@xxxxxxxxx> wrote:
After an upgrade from 5.2 to 5.3, the cluster, named GFSCluster, seems to stop being a cluster. GFSCluster is a 2 node cluster using iscsi, cman, clvm, and gfs and it was working fine when it was on 5.2 The configuration on both of the nodes (passwords removed)
<?xml version="1.0"?>
<cluster name="GFSCluster" config_version="5">
<cman expected_votes="1" two_node="1"/>
<clusternodes><clusternode name="node01.company.com" votes="1" nodeid="1"><fence><method name="single"><device name="node01_ipmi"/></method></fence></clusternode><clusternode name="node02.company.com" votes="1" nodeid="2"><fence><method name="single"><device name="node02_ipmi"/></method></fence></clusternode></clusternodes>
<fencedevices><fencedevice name="node01_ipmi" agent="fence_ipmilan" ipaddr="10.1.0.5" login="root" passwd="********"/><fencedevice name="node02_ipmi" agent="fence_ipmilan" ipaddr="10.1.0.7" login="root" passwd="********"/></fencedevices>
<rm>
<failoverdomains/>
<resources/>
</rm>
</cluster>
When starting the service cman, they both hang on the part starting fencing
Starting cluster:
Loading modules... done
Mounting configfs... done
Starting ccsd... done
Starting cman... done
Starting daemons... done
Starting fencing...
After 5 minutes the task finishes with "done" but clustat says
==== As root on web01.company.com ====
Cluster Status for GFSCluster @ Wed Jul 8 01:00:24 2009
Member Status: Quorate
Member Name ID Status
------ ---- ---- ------
node01.company.com 1 Online, Local
node02.company.com 2 Offline
==== As root on web02.company.com ====
Cluster Status for GFSCluster @ Wed Jul 8 01:00:26 2009
Member Status: Quorate
Member Name ID Status
------ ---- ---- ------
node01.company.com 1 Offline
node02.company.com 2 Online, Local
They are both quorate with their own cluster
In the logs of web01 I found repeating messages
Jul 8 00:55:27 web01 fenced[21872]: node02.company.com not a cluster member after 6 sec post_join_delay
Jul 8 00:55:27 web01 fenced[21872]: fencing node "node02.company.com"
Jul 8 00:55:52 web01 fenced[21872]: agent "fence_ipmilan" reports: Rebooting machine @ IPMI:10.1.0.7...ipmilan: Failed to connect after 30 seconds Failed
In the logs of web02 I also found the same repeating messages
Jul 8 00:55:27 web02 fenced[6363]: node01.company.com not a cluster member after 6 sec post_join_delay
Jul 8 00:55:27 web02 fenced[6363]: fencing node "node01.company.com"
Jul 8 00:55:53 web02 fenced[6363]: agent "fence_ipmilan" reports: Rebooting machine @ IPMI:10.1.0.5...ipmilan: Failed to connect after 30 seconds Failed
Is there a bug on 5.3 with regards to clustering?
Is there any workarounds?
Feel safer online. Upgrade to the new, safer Internet Explorer 8 optimized for Yahoo! to put your mind at peace. It's free. Get IE8 here! http://downloads.yahoo.com/sg/internetexplorer/
--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster
-- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster