Hello, Thanks for the info. Now I am doing manual fencing but get the following error whenever I do a failover. Mar 12 17:25:50 node2 clurgmgrd[6088]: <info> State change: node1 DOWN Mar 12 17:25:52 node2 clurgmgrd[6088]: <notice> Starting stopped service iscsi_ip Mar 12 17:25:52 node2 clurgmgrd: [6088]: <info> Adding IPv4 address 172.40.2.119 to eth2 Mar 12 17:25:52 node2 clurgmgrd[6088]: <notice> Starting stopped service iscsi_lun Mar 12 17:25:53 node2 clurgmgrd[6088]: <notice> Service iscsi_lun started Mar 12 17:25:54 node2 clurgmgrd[6088]: <notice> Service iscsi_ip started Mar 12 17:26:24 node2 kernel: CMAN: removing node node1 from the cluster : Missed too many heartbeats Mar 12 17:26:24 node2 fenced[6040]: node1 not a cluster member after 0 sec post_fail_delay Mar 12 17:26:24 node2 fenced[6040]: fencing node "node1" Mar 12 17:26:24 node2 fence_manual: Node node1 needs to be reset before recovery can procede. Waiting for node1 to rejoin the cluster or for manual acknowledgement that it has been reset (i.e. fence_ack_manual -n node1) I just power down node 1 to simulate the failover to node2. Unless I execute the command fence_ack_manual -n node1, the system will not move forward and wait in fencing. How to fix this error? During shutdown, I get the following error message and system waits there infinitely. Starting Killall: CMAN: sendmsg failed: -101 WARNING: dlm_emergency_shutdown SM: 00000003 sm_stop: SG stilljoined How to fix this error? Thanks, Sai Logan -----Original Message----- From: linux-cluster-bounces@xxxxxxxxxx [mailto:linux-cluster-bounces@xxxxxxxxxx] On Behalf Of linux-cluster-request@xxxxxxxxxx Sent: Saturday, March 10, 2007 9:00 AM To: linux-cluster@xxxxxxxxxx Subject: Linux-cluster Digest, Vol 35, Issue 13 Send Linux-cluster mailing list submissions to linux-cluster@xxxxxxxxxx To subscribe or unsubscribe via the World Wide Web, visit https://www.redhat.com/mailman/listinfo/linux-cluster or, via email, send a message with subject or body 'help' to linux-cluster-request@xxxxxxxxxx You can reach the person managing the list at linux-cluster-owner@xxxxxxxxxx When replying, please edit your Subject line so it is more specific than "Re: Contents of Linux-cluster digest..." Today's Topics: 1. Re: cluster not doing failover (Jonathan E Brassow) ---------------------------------------------------------------------- Message: 1 Date: Fri, 9 Mar 2007 19:53:40 -0600 From: Jonathan E Brassow <jbrassow@xxxxxxxxxx> Subject: Re: cluster not doing failover To: linux clustering <linux-cluster@xxxxxxxxxx> Message-ID: <40407159e8e6506b05d46c82d921d936@xxxxxxxxxx> Content-Type: text/plain; charset="iso-8859-1" On Mar 9, 2007, at 5:30 PM, Sai Loganathan wrote: > <fencedevices> > <fencedevice agent="fence_ilo" hostname="admin" > login="admin" name="node1_fence" passwd="admin"/> > <fencedevice agent="fence_ilo" hostname="admin" > login="admin" name="node2_fence" passwd="admin"/> > </fencedevices> The above line look funny to me. The hostname for the fence device is "admin"? > Using the cluster ip address (172.40.2.119), I was able to do an nfs > mount of the shared lun from a 3rd machine. Started an infinite ls on > that lun. > To simulate failover, I just powered-down the node1 and hoping to see > the read io stop but resume via the node2. But, I see the following > error message on the node 2. > Mar 9 12:14:49 node2 fenced[7422]: fence "node1" failed > Mar 9 12:14:54 node2 fenced[7422]: fencing node "node1" > Mar 9 12:14:54 node2 fenced[7422]: agent "fence_ilo" reports: Can't > call method "configure" on an undefined value at /sbin/fence_ilo line > 169, <> line 4. > Mar 9 12:14:54 node2 fenced[7422]: fence "node1" failed > Mar 9 12:14:59 node2 fenced[7422]: fencing node "node1" > Mar 9 12:14:59 node2 fenced[7422]: agent "fence_ilo" reports: Can't > call method "configure" on an undefined value at /sbin/fence_ilo line > 169, <> line 4. > > Seems like I am not doing something correct with respect to fencing. > Can I setup cluster without fencing first of all? Yes. You can use manual fencing. That should only be used for testing purposes though... it is not a supported configuration. brassow -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/enriched Size: 3471 bytes Desc: not available Url : https://www.redhat.com/archives/linux-cluster/attachments/20070309/c015b8da/ attachment.bin ------------------------------ -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster End of Linux-cluster Digest, Vol 35, Issue 13 ********************************************* _________________________________________________________________________________________________________________ This message and any attachment are confidential and may be privileged or otherwise protected from disclosure. If you are not the intended recipient please telephone or e-mail the sender and delete this message and all attachments from your system - ServerEngines LLC -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster