Re: Fence Issue on BL 460C G6

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I believe your problem is being caused by "nofailback" being set to "1". :

                       <failoverdomain name="Failover" nofailback="1" ordered="0" restricted="0">

Set it to zero and I believe your problem will be resolved.

On Wed, Oct 27, 2010 at 10:43 PM, Wahyu Darmawan <wahyu@xxxxxxxxxxxxxx> wrote:
Hi Ben,
Here is my cluster.conf. Need your help please.


<?xml version="1.0"?>
<cluster alias="PORTAL_WORLD" config_version="32" name="PORTAL_WORLD">
       <fence_daemon clean_start="0" post_fail_delay="0" post_join_delay="3"/>
       <clusternodes>
               <clusternode name="rhel-cluster-node1.mgmt.local" nodeid="1" votes="1">
                       <fence>
                               <method name="1">
                                       <device name="NODE1-ILO"/>
                               </method>
                       </fence>
               </clusternode>
               <clusternode name="rhel-cluster-node2.mgmt.local" nodeid="2" votes="1">
                       <fence>
                               <method name="1">
                                       <device name="NODE2-ILO"/>
                               </method>
                       </fence>
               </clusternode>
       </clusternodes>
       <quorumd device="/dev/sdf1" interval="3" label="quorum_disk1" tko="23" votes="2">
               <heuristic interval="2" program="ping 10.4.0.1 -c1 -t1" score="1"/>
       </quorumd>
       <cman expected_votes="1" two_node="1"/>
       <fencedevices>
               <fencedevice agent="fence_ilo" hostname="ilo-node2" login="Administrator" name="NODE2-ILO" passwd="password"/>
               <fencedevice agent="fence_ilo" hostname="ilo-node1" login="Administrator" name="NODE1-ILO" passwd="password"/>
       </fencedevices>
       <rm>
               <failoverdomains>
                       <failoverdomain name="Failover" nofailback="1" ordered="0" restricted="0">
                               <failoverdomainnode name="rhel-cluster-node2.mgmt.local" priority="1"/>
                               <failoverdomainnode name="rhel-cluster-node1.mgmt.local" priority="1"/>
                       </failoverdomain>
               </failoverdomains>
               <resources>
                       <ip address="10.4.1.103" monitor_link="1"/>
               </resources>
               <service autostart="1" domain="Failover" exclusive="0" name="IP_Virtual" recovery="relocate">
                       <ip ref="10.4.1.103"/>
               </service>
       </rm>
</cluster>

Many thanks,
Wahyu

-----Original Message-----
From: linux-cluster-bounces@xxxxxxxxxx [mailto:linux-cluster-bounces@xxxxxxxxxx] On Behalf Of Ben Turner
Sent: Thursday, October 28, 2010 12:18 AM
To: linux clustering
Subject: Re: Fence Issue on BL 460C G6

My guess is there is a problem with fencing.  Are you running fence_ilo with an HP blade?   Iirc the iLOs on the blades have a different CLI, I don't think fence_ilo will work with them.  What do you see in the messages files during these events?  If you see failed fence messages you may want to look into using fence_ipmilan:

http://sources.redhat.com/cluster/wiki/IPMI_FencingConfig

If you post a snip of your messages file from this event and your cluster.conf I will have a better idea of what is going on.

-b



----- "Wahyu Darmawan" <wahyu@xxxxxxxxxxxxxx> wrote:

> Hi all,
>
>
>
> For fencing, I’m using HP iLO and server is BL460c G6. Problem is
> resource is start moving to the passive when the failed node is power
> on. It is really strange for me. For example, I shutdown the node1 and
> physically remove the node1 machine from the blade chassis and monitor
> the clustat output, clustat was still showing that the resource is on
> node 1, even node 1 is power down and removed from c7000 blade
> chassis. But when I plugged again the failed node1 on the c7000 blade
> chassis and it power-on, then clustat is showing that the resource is
> start moving to the passive node from the failed node.
> I’m powering down the blade server with power button in front of it,
> then we remove it from the chassis, If we face the hardware problem in
> our active node and the active node goes down then how the resource
> move to the passive node. In addition, When I rebooted or shutdown the
> machine from the CLI, then the resource moves successfully from the
> passive node. Furthurmore, When I shutdown the active node with
> "shutdown -hy 0" command, after shuting down the active node
> automatically restart.
>
> Please help me.
>
>
>
> Many Thanks,
> --
> Linux-cluster mailing list
> Linux-cluster@xxxxxxxxxx
> https://www.redhat.com/mailman/listinfo/linux-cluster

--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster


--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster

--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster

[Index of Archives]     [Corosync Cluster Engine]     [GFS]     [Linux Virtualization]     [Centos Virtualization]     [Centos]     [Linux RAID]     [Fedora Users]     [Fedora SELinux]     [Big List of Linux Books]     [Yosemite Camping]

  Powered by Linux