Hello, Thanks for the info. Now I am doing manual fencing but get the following error whenever I do a failover. Mar 12 17:25:50 node2 clurgmgrd[6088]: <info> State change: node1 DOWN Mar 12 17:25:52 node2 clurgmgrd[6088]: <notice> Starting stopped service iscsi_ip Mar 12 17:25:52 node2 clurgmgrd: [6088]: <info> Adding IPv4 address 172.40.2.119 to eth2 Mar 12 17:25:52 node2 clurgmgrd[6088]: <notice> Starting stopped service iscsi_lun Mar 12 17:25:53 node2 clurgmgrd[6088]: <notice> Service iscsi_lun started Mar 12 17:25:54 node2 clurgmgrd[6088]: <notice> Service iscsi_ip started Mar 12 17:26:24 node2 kernel: CMAN: removing node node1 from the cluster : Missed too many heartbeats Mar 12 17:26:24 node2 fenced[6040]: node1 not a cluster member after 0 sec post_fail_delay Mar 12 17:26:24 node2 fenced[6040]: fencing node "node1" Mar 12 17:26:24 node2 fence_manual: Node node1 needs to be reset before recovery can procede. Waiting for node1 to rejoin the cluster or for manual acknowledgement that it has been reset (i.e. fence_ack_manual -n node1) I just power down node 1 to simulate the failover to node2. Unless I execute the command fence_ack_manual -n node1, the system will not move forward and wait in fencing. How to fix this error? During shutdown, I get the following error message and system waits there infinitely. Starting Killall: CMAN: sendmsg failed: -101 WARNING: dlm_emergency_shutdown SM: 00000003 sm_stop: SG stilljoined How to fix this error? Thanks, Sai Logan -----Original Message----- From: linux-cluster-bounces@xxxxxxxxxx [mailto:linux-cluster-bounces@xxxxxxxxxx] On Behalf Of linux-cluster-request@xxxxxxxxxx Sent: Saturday, March 10, 2007 9:00 AM To: linux-cluster@xxxxxxxxxx Subject: Linux-cluster Digest, Vol 35, Issue 13 Send Linux-cluster mailing list submissions to linux-cluster@xxxxxxxxxx To subscribe or unsubscribe via the World Wide Web, visit https://www.redhat.com/mailman/listinfo/linux-cluster or, via email, send a message with subject or body 'help' to linux-cluster-request@xxxxxxxxxx You can reach the person managing the list at linux-cluster-owner@xxxxxxxxxx When replying, please edit your Subject line so it is more specific than "Re: Contents of Linux-cluster digest..." Today's Topics: 1. Re: cluster not doing failover (Jonathan E Brassow) ---------------------------------------------------------------------- Message: 1 Date: Fri, 9 Mar 2007 19:53:40 -0600 From: Jonathan E Brassow <jbrassow@xxxxxxxxxx> Subject: Re: cluster not doing failover To: linux clustering <linux-cluster@xxxxxxxxxx> Message-ID: <40407159e8e6506b05d46c82d921d936@xxxxxxxxxx> Content-Type: text/plain; charset="iso-8859-1" On Mar 9, 2007, at 5:30 PM, Sai Loganathan wrote: > <fencedevices> > <fencedevice agent="fence_ilo" hostname="admin" > login="admin" name="node1_fence" passwd="admin"/> > <fencedevice agent="fence_ilo" hostname="admin" > login="admin" name="node2_fence" passwd="admin"/> > </fencedevices> The above line look funny to me. The hostname for the fence device is "admin"? > Using the cluster ip address (172.40.2.119), I was able to do an nfs > mount of the shared lun from a 3rd machine. Started an infinite ls on > that lun. > To simulate failover, I just powered-down the node1 and hoping to see > the read io stop but resume via the node2. But, I see the following > error message on the node 2. > Mar 9 12:14:49 node2 fenced[7422]: fence "node1" failed > Mar 9 12:14:54 node2 fenced[7422]: fencing node "node1" > Mar 9 12:14:54 node2 fenced[7422]: agent "fence_ilo" reports: Can't > call method "configure" on an undefined value at /sbin/fence_ilo line > 169, <> line 4. > Mar 9 12:14:54 node2 fenced[7422]: fence "node1" failed > Mar 9 12:14:59 node2 fenced[7422]: fencing node "node1" > Mar 9 12:14:59 node2 fenced[7422]: agent "fence_ilo" reports: Can't > call method "configure" on an undefined value at /sbin/fence_ilo line > 169, <> line 4. > > Seems like I am not doing something correct with respect to fencing. > Can I setup cluster without fencing first of all? Yes. You can use manual fencing. That should only be used for testing purposes though... it is not a supported configuration. brassow -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/enriched Size: 3471 bytes Desc: not available Url : https://www.redhat.com/archives/linux-cluster/attachments/20070309/c015b8da/ attachment.bin ------------------------------ -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster End of Linux-cluster Digest, Vol 35, Issue 13 ********************************************* ___________________________________________________________________________________ This message, together with any attachment(s), contains confidential and proprietary information of ServerEngines LLC and is intended only for the designated recipient(s) named above. Any unauthorized review, printing, retention, copying, disclosure or distribution is strictly prohibited. If you are not the intended recipient of this message, please immediately advise the sender by reply email message and delete all copies of this message and any attachment(s). Thank you. -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster