Hi, If you have no fencing devices, you *must* use fence_manual. CMAN operations AFAIK will not work without some form of fencing in place. It's critical to cluster services, and in fact, I'm surprised CMAN even starts given the fact that you have no fence devices in /etc/cluster/cluster.conf - no matter though. fence_manual is a bit of a cop out. It to will not provide proper fencing operations in the event of split brain - and the act of fencing is required to be automatic to ensure data corruption is not occurring on the shared storage that the nodes write to. Seeing as your using Xen for your cluster nodes, check the man pages on fence_xvm and fence_xvmd. I have no idea what the difference is between the two (as I use physical hosts) but that seems to be the fencing agent for Xen virtualised cluster nodes. Someone else on the list might have a better understanding of these fencing agents, so they might be able to clue you (and me!) in a bit better. I'd probably also suggest that in your situation to configure fence_manual as a backup fencing agent, once fence_xvm(d) is in place as the primary fence agent. Realistically though, the setup of two Xen virtualisation cluster nodes on one physical dom0 host is a single point of failure and doing so mostly negates the need for Red Hat Cluster Suite at all within the stack (if you are trying to use it to provide high-availability). But if your just doing a bit of testing on the RHCS, fence_manual might get you by. I wouldn't recommend staying on it though, I find using RHCS + fence_manual tends to cause more interruption to users than not having it implemented at all. If you want documentation on how to implement fencing operations - check the Cluster Project FAQ section on fencing (http://sources.redhat.com/cluster/faq.html#fence_what) or the Configuring and Managing a Red Hat Cluster for RHEL5.2 document from the Red Hat Cluster Suite Documentation (http://www.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/5.2/html/Cluster_Administration/index.html) Regards, Stewart On Tue Feb 17 16:48 , ESGLinux sent: >Hi, > >first, thank you very much for your answer, > > You are right, I have not fencing devices at all, but for one reason: I havent!!! > >I´m just testing with 2 xen virtual machines running on the same host and mounting an iscsi disk on other host to simulate shared storage. > > >on the other hand, I think I don´t understand the concept of fencing, > >I try to configure fencing devices with luci, but when I try I don´t know what to select from the combo of fencing devices. (perphaps manual fencing, althoug its not recommended for production) > > >so, as I think this is a newbie and perhaps a silly question, > >Can you give any good reference about fencing to learn about it or an example configuation with fence devices to see how it must be done > >thanks again, > > >ESG > > >2009/2/17 spods@xxxxxxxxxxxx <spods@xxxxxxxxxxxx> > >A couple of things. > > > >You don't have any fencing devices defined in cluster.conf at all.  No power > >fencing, no I/O fencing, not even manual fencing. > > > >You need to define how each node of the cluster is to be fenced (forcibly removed > >from the cluster) for proper failover operations to occur. > > > >Secondly, if the only connection shared between the two nodes is the network cord > >you just disconnected, then of course nothing will happen - each node has just > >lost the only common connection between each other to control the faulty node > >(i.e. through fencing). > > > >There need's to be more connections in between the nodes of a cluster than just a > >network card.  This can be achieved with a second NIC, I/O fencing, centralised > >or individual power controls (I/O switches or IPMI). > > > >That way in the event that the network connection is the single point of failure > >between the two nodes, at least a node can be fenced if it's behaving improperly. > > > >Once the faulty node is fenced, the remaining nodes should at that point continue > >providing cluster services. > > > >Regards, > > > >Stewart > > > > > > > > > >On Mon Feb 16 16:29 , ESGLinux  sent: > > > >>Hello All, > >> > >>I have a cluster with two nodes running one service (mysql). The two nodes uses > >a ISCSI disk with gfs on it. > >>I haven�´t configured fencing at all. > >> > >>I have tested diferent situtations of fail and these are my results: > >> > >> > >>If I halt node1 the service relocates to node2 - OK > >>if I kill the process in node1 the services relocate to node2 - OK > >> > >>but > >> > >>if I unplug the wire of the ether device or make ifdown eth0 on node1 all the > >cluster fails. The service doesn�´t relocate. > >> > >>In node2 I get the messages: > >> > >>Feb 15 13:29:34 localhost fenced[3405]: fencing node "192.168.1.188" > >>Feb 15 13:29:34 localhost fenced[3405]: fence "192.168.1.188" failed > >>Feb 15 13:29:39 localhost fenced[3405]: fencing node "192.168.1.188" > >> > >>Feb 15 13:29:39 localhost fenced[3405]: fence "192.168.1.188" failed > >> > >>again and again. The node2 never runs the service and I try to reboot the node1 > >the computer hangs waiting for stopping the services. > >> > >> > >>In this situation all I can do is to switch off the power of node1 and reboot > >the node2. This situation is not acceptable at all. > >> > >>I think the problem is just with fencing but I dont know how to apply to this > >situation ( I have RTFM from redhat site� but I have seen how to apply it. :-( ) > >> > >> > >>this is my cluster.conf file > >> > >><cluster alias="MICLUSTER" config_version="62" name="MICLUSTER"> > >>� � � � � � � <fence_daemon clean_start="0" post_fail_delay="0" post_join_delay="3"/> > >> > >>� � � � � � � <clusternodes> > >>� � � � � � � � � � � � � � � <clusternode name="node1" nodeid="1" votes="1"> > >>� � � � � � � � � � � � � � � � � � � � � � � <fence/> > >>� � � � � � � � � � � � � � � </clusternode> > >>� � � � � � � � � � � � � � � <clusternode name="node2" nodeid="2" votes="1"> > >> > >>� � � � � � � � � � � � � � � � � � � � � � � <fence/> > >>� � � � � � � � � � � � � � � </clusternode> > >>� � � � � � � </clusternodes> > >>� � � � � � � <cman expected_votes="1" two_node="1"/> > >>� � � � � � � <fencedevices/> > >> > >>� � � � � � � <rm> > >>� � � � � � � � � � � � � � � <failoverdomains> > >>� � � � � � � � � � � � � � � � � � � � � � � <failoverdomain name="DOMINIOFAIL" nofailback="0" > >ordered="0" restricted="1"> > >>� � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � <failoverdomainnode name="node1" priority="1"/> > >> > >>� � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � <failoverdomainnode name="node2" priority="1"/> > >>� � � � � � � � � � � � � � � � � � � � � � � </failoverdomain> > >>� � � � � � � � � � � � � � � </failoverdomains> > >>� � � � � � � � � � � � � � � <resources/> > >> > >>� � � � � � � � � � � � � � � <service domain="DOMINIOFAIL" exclusive="0" name="BBDD" > >revovery="restart"> > >>� � � � � � � � � � � � � � � � � � � � � � � <mysql config_file="/etc/my.cnf" listen_address="" > >mysql_options="" name="mydb" shutdown_wait="3"/> > >> > >>� � � � � � � � � � � � � � � � � � � � � � � <ip address="192.168.1.183" monitor_link="1"/> > >>� � � � � � � � � � � � � � � </service> > >>� � � � � � � </rm> > >></cluster> > >> > >>Any idea? references? > >> > >>Thanks in advance > >> > >> > >>Greetings > >> > >>ESG > >> > >> > >> > >> > > > > > > > -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster