cluster down network

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I have two node cluster .
 
 
/etc/cluster/cluster.conf
 
     <?xml version="1.0"?>
<cluster alias="saza" config_version="38" name="saza">
        <fence_daemon post_fail_delay="0" post_join_delay="3"/>
        <clusternodes>
                <clusternode name="node1.network.com" nodeid="1" votes="1">
                        <fence/>
                </clusternode>
                <clusternode name="node2.network.com" nodeid="2" votes="1">
                        <fence/>
                </clusternode>
        </clusternodes>
        <cman expected_votes="1" two_node="1"/>
        <fencedevices/>
        <rm>
                <failoverdomains>
                        <failoverdomain name="cenall" ordered="1" restricted="1">
                                <failoverdomainnode name="node1.network.com" priority="1"/>
                                <failoverdomainnode name="node2.network.com" priority="1"/>
                        </failoverdomain>
                </failoverdomains>
                <resources>
                        <script file="/etc/init.d/httpd" name="httpd"/>
                        <fs device="/dev/sda3" force_fsck="0" force_unmount="0" fsid="6443" fstype="ext3" mountpoint="/var/www/html" name="httpd-content" options="" self_fence="0"/>
                        <ip address="10.28.99.81" monitor_link="1"/>
                </resources>
                <service autostart="1" domain="cenall" name="webserver">
                        <script ref="httpd"/>
                        <fs ref="httpd-content"/>
                        <ip ref="10.28.99.81"/>
                </service>
        </rm>
</cluster>
 
when i restart network on node2.
service network shutdown
service network start.
 
and then
 
on node2
i can't start cman .
 
start fencing ... failed
 
Message log on node2
 

openais[22433]: [SYNC ] Not using a virtual synchrony filter.
Jan 12 02:05:02 clus2 groupd[22442]: found uncontrolled kernel object rgmanager in /sys/kernel/dlm
Jan 12 02:05:02 clus2 openais[22433]: [TOTEM] Creating commit token because I am the rep.
Jan 12 02:05:02 clus2 groupd[22442]: local node must be reset to clear 1 uncontrolled instances of gfs and/or dlm
Jan 12 02:05:02 clus2 openais[22433]: [TOTEM] Saving state aru 0 high seq received 0
Jan 12 02:05:02 clus2 fence_node[22467]: Fence of "node2.network.com" was unsuccessful
Jan 12 02:05:02 clus2 openais[22433]: [TOTEM] entering COMMIT state.
Jan 12 02:05:02 clus2 gfs_controld[22460]: groupd_dispatch error -1 errno 104
Jan 12 02:05:02 clus2 dlm_controld[22454]: groupd is down, exiting
Jan 12 02:05:02 clus2 fenced[22448]: groupd is down, exiting
Jan 12 02:05:03 clus2 openais[22433]: [TOTEM] entering RECOVERY state.
Jan 12 02:05:03 clus2 gfs_controld[22460]: groupd connection died
Jan 12 02:05:03 clus2 kernel: dlm: closing connection to node 2
Jan 12 02:05:03 clus2 openais[22433]: [TOTEM] position [0] member 192.168.100.2:
Jan 12 02:05:03 clus2 gfs_controld[22460]: cluster is down, exiting
Jan 12 02:05:03 clus2 kernel: dlm: closing connection to node 1
Jan 12 02:05:03 clus2 openais[22433]: [TOTEM] previous ring seq 0 rep 192.168.100.2
Jan 12 02:05:03 clus2 openais[22433]: [TOTEM] aru 0 high delivered 0 received flag 0
Jan 12 02:05:03 clus2 openais[22433]: [TOTEM] Did not need to originate any messages in recovery.
Jan 12 02:05:03 clus2 openais[22433]: [TOTEM] Storing new sequence id for ring 4
Jan 12 02:05:03 clus2 openais[22433]: [TOTEM] Sending initial ORF token
Jan 12 02:05:03 clus2 openais[22433]: [CLM  ] CLM CONFIGURATION CHANGE
Jan 12 02:05:03 clus2 openais[22433]: [CLM  ] New Configuration:
Jan 12 02:05:03 clus2 openais[22433]: [CLM  ] Members Left:
Jan 12 02:05:03 clus2 openais[22433]: [CLM  ] Members Joined:
Jan 12 02:05:03 clus2 openais[22433]: [SYNC ] This node is within the primary component and will provide service.
Jan 12 02:05:03 clus2 openais[22433]: [CLM  ] CLM CONFIGURATION CHANGE
Jan 12 02:05:03 clus2 openais[22433]: [CLM  ] New Configuration:
Jan 12 02:05:03 clus2 openais[22433]: [CLM  ]   r(0) ip(192.168.100.2) 
Jan 12 02:05:03 clus2 openais[22433]: [CLM  ] Members Left:
Jan 12 02:05:03 clus2 openais[22433]: [CLM  ] Members Joined:
Jan 12 02:05:03 clus2 openais[22433]: [CLM  ]   r(0) ip(192.168.100.2) 
Jan 12 02:05:03 clus2 openais[22433]: [SYNC ] This node is within the primary component and will provide service.
Jan 12 02:05:03 clus2 openais[22433]: [TOTEM] entering OPERATIONAL state.
Jan 12 02:05:03 clus2 openais[22433]: [CMAN ] quorum regained, resuming activity
Jan 12 02:05:03 clus2 openais[22433]: [CLM  ] got nodejoin message 192.168.100.2
Jan 12 02:05:03 clus2 openais[22433]: [TOTEM] entering GATHER state from 11.
Jan 12 02:05:03 clus2 openais[22433]: [TOTEM] Saving state aru 9 high seq received 9
Jan 12 02:05:03 clus2 openais[22433]: [TOTEM] entering COMMIT state.
Jan 12 02:05:03 clus2 openais[22433]: [TOTEM] entering RECOVERY state.
Jan 12 02:05:03 clus2 openais[22433]: [TOTEM] position [0] member 192.168.100.1:
Jan 12 02:05:04 clus2 openais[22433]: [TOTEM] previous ring seq 132 rep 192.168.100.1
Jan 12 02:05:04 clus2 openais[22433]: [TOTEM] aru c high delivered c received flag 0
Jan 12 02:05:04 clus2 openais[22433]: [TOTEM] position [1] member 192.168.100.2:
Jan 12 02:05:04 clus2 openais[22433]: [TOTEM] previous ring seq 4 rep 192.168.100.2
Jan 12 02:05:04 clus2 openais[22433]: [TOTEM] aru 9 high delivered 9 received flag 0
Jan 12 02:05:04 clus2 openais[22433]: [TOTEM] Did not need to originate any messages in recovery.
Jan 12 02:05:04 clus2 openais[22433]: [TOTEM] Storing new sequence id for ring 88
Jan 12 02:05:04 clus2 openais[22433]: [CLM  ] CLM CONFIGURATION CHANGE
Jan 12 02:05:04 clus2 openais[22433]: [CLM  ] New Configuration:
Jan 12 02:05:04 clus2 openais[22433]: [CLM  ]   r(0) ip(192.168.100.2) 
Jan 12 02:05:04 clus2 openais[22433]: [CLM  ] Members Left:
Jan 12 02:05:04 clus2 openais[22433]: [CLM  ] Members Joined:
Jan 12 02:05:04 clus2 openais[22433]: [SYNC ] This node is within the primary component and will provide service.
Jan 12 02:05:04 clus2 openais[22433]: [CLM  ] CLM CONFIGURATION CHANGE
Jan 12 02:05:04 clus2 openais[22433]: [CLM  ] New Configuration:
Jan 12 02:05:04 clus2 openais[22433]: [CLM  ]   r(0) ip(192.168.100.1) 
Jan 12 02:05:04 clus2 openais[22433]: [CLM  ]   r(0) ip(192.168.100.2) 
Jan 12 02:05:04 clus2 openais[22433]: [CLM  ] Members Left:
Jan 12 02:05:04 clus2 openais[22433]: [CLM  ] Members Joined:
Jan 12 02:05:04 clus2 openais[22433]: [CLM  ]   r(0) ip(192.168.100.1) 
Jan 12 02:05:04 clus2 openais[22433]: [SYNC ] This node is within the primary component and will provide service.
Jan 12 02:05:04 clus2 openais[22433]: [TOTEM] entering OPERATIONAL state.
Jan 12 02:05:04 clus2 openais[22433]: [CLM  ] got nodejoin message 192.168.100.1
Jan 12 02:05:04 clus2 openais[22433]: [CLM  ] got nodejoin message 192.168.100.2
Jan 12 02:05:04 clus2 openais[22433]: [CPG  ] got joinlist message from node 1
Jan 12 02:05:04 clus2 openais[22433]: [CMAN ] cman killed by node 2 for reason 2
Jan 12 02:05:32 clus2 ccsd[22427]: Unable to connect to cluster infrastructure after 30 seconds.
 
 
but on node1 can   restart cman. but message log show
fence node2.network.com failed.
 
how  to join member again ?


Be a better friend, newshound, and know-it-all with Yahoo! Mobile. Try it now.
--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster

[Index of Archives]     [Corosync Cluster Engine]     [GFS]     [Linux Virtualization]     [Centos Virtualization]     [Centos]     [Linux RAID]     [Fedora Users]     [Fedora SELinux]     [Big List of Linux Books]     [Yosemite Camping]

  Powered by Linux