If this is thr error, then start the lvm by "/usr/sbin/clvmd" which will start clvmd while cluster VG/resources are online. [root@yoda2 ~]# /etc/init.d/clvmd status clvmd dead but subsys locked active volumes: LV06 LV_nex2 -----Original Message----- From: linux-cluster-bounces@xxxxxxxxxx [mailto:linux-cluster-bounces@xxxxxxxxxx] On Behalf Of linux-cluster-request@xxxxxxxxxx Sent: Wednesday, June 25, 2008 5:58 AM To: linux-cluster@xxxxxxxxxx Subject: Linux-cluster Digest, Vol 50, Issue 32 Send Linux-cluster mailing list submissions to linux-cluster@xxxxxxxxxx To subscribe or unsubscribe via the World Wide Web, visit https://www.redhat.com/mailman/listinfo/linux-cluster or, via email, send a message with subject or body 'help' to linux-cluster-request@xxxxxxxxxx You can reach the person managing the list at linux-cluster-owner@xxxxxxxxxx When replying, please edit your Subject line so it is more specific than "Re: Contents of Linux-cluster digest..." Today's Topics: 1. Re: can't communicate with fenced -1 (GS R) 2. Re: Info & documentation on configuring Power Fencing using IBM RSA II (x3850/3950 M2 servers ) (GS R) 3. Re: can't communicate with fenced -1 (Gian Paolo Buono) 4. Re: can't communicate with fenced -1 (Gian Paolo Buono) 5. Re: can't communicate with fenced -1 (GS R) ---------------------------------------------------------------------- Message: 1 Date: Wed, 25 Jun 2008 14:03:51 +0530 From: "GS R" <gsrlinux@xxxxxxxxx> Subject: Re: can't communicate with fenced -1 To: "linux clustering" <linux-cluster@xxxxxxxxxx> Message-ID: <d765e01f0806250133t521f1a99y2852225d4e46a891@xxxxxxxxxxxxxx> Content-Type: text/plain; charset="iso-8859-1" > > > > > 2008/6/25 GS R <gsrlinux@xxxxxxxxx>: > >> >> >> On 6/24/08, Gian Paolo Buono <gpbuono@xxxxxxxxx> wrote: >> >>> Hi, >>> >>> We have two RHEL5.1 boxes installed sharing a >>> single iscsi emc2 SAN, whitout fence devices. System is configured >>> >>> >>> as a high-availability system of xen guest. >>> >>> One of the most repeating problems are fence_tool related. >>> >>> # service cman start >>> Starting cluster: >>> Loading modules... done >>> Mounting configfs... done >>> Starting ccsd... done >>> Starting cman... done >>> Starting daemons... done >>> Starting fencing... fence_tool: can't communicate with fenced -1 >>> >>> >>> >>> # fenced -D >>> 1204556546 cman_init error 0 111 >>> >>> # clustat >>> CMAN is not running. >>> >>> # cman_tool join >>> >>> # clustat >>> msg_open: Connection refused >>> >>> Member Status: Quorate >>> Member Name ID Status >>> >>> ------ ---- ---- ------ >>> yoda1 1 Online, Local >>> yoda2 2 Offline >>> >>> Sometimes this problem gets solved if the two machines are rebooted at >>> >>> >>> >>> the same time. But in the current HA configuration, I cannot guarantee >>> two systems will be rebooted at the same time for every problem we >>> face. This is my config file: >>> >>> ###################################cluster.conf######################### ########### >>> >>> >>> >>> <?xml version="1.0"?> >>> <cluster alias="yoda-cl" config_version="2" name="yoda-cl"> >>> <fence_daemon clean_start="0" post_fail_delay="0" post_join_delay="3"/> >>> >>> >>> >>> <clusternodes> >>> <clusternode name="yoda2" nodeid="1" votes="1"> >>> <fence/> >>> </clusternode> >>> <clusternode name="yoda1" nodeid="2" votes="1"> >>> >>> >>> >>> <fence/> >>> </clusternode> >>> </clusternodes> >>> <cman expected_votes="1" two_node="1"/> >>> <rm> >>> <failoverdomains/> >>> >>> >>> >>> <resources/> >>> </rm> >>> <fencedevices/> >>> </cluster> >>> ###################################cluster.conf######################### ########### >>> Regards. >>> >>> Hi >> >> I configured a two node cluster with no fence device on RHEL5.1. >> The cluster started and stopped with no issues. The only difference that I >> see is that I have used FQDN in my cluster.conf >> >> i.e., <clusternode name="yoda2*.gsr.com*" nodeid="1" votes="1"> >> >> Check your /etc/hosts if it has the FQDN in it. >> >> Thanks >> Gowrishankar Rajaiyan >> >> >> > On 6/25/08, Gian Paolo Buono <gpbuono@xxxxxxxxx> wrote: > > Hi, > the problem of my cluster is that it start-up weel but after two days the > problem that I have described is running, and this problem gets solved if > the two machines are rebooted at the same time. > > Thanks > Gian Paolo Hi Gian Could you please attach the logs. Thanks Gowrishankar Rajaiyan -------------- next part -------------- An HTML attachment was scrubbed... URL: https://www.redhat.com/archives/linux-cluster/attachments/20080625/805cd bbe/attachment.html ------------------------------ Message: 2 Date: Wed, 25 Jun 2008 14:17:26 +0530 From: GS R <gsrlinux@xxxxxxxxx> Subject: Re: Info & documentation on configuring Power Fencing using IBM RSA II (x3850/3950 M2 servers ) To: linux clustering <linux-cluster@xxxxxxxxxx> Message-ID: <4862061E.9040306@xxxxxxxxx> Content-Type: text/plain; charset="iso-8859-1" sunhux G wrote: > Hi, > > We've been googling to look for step by step guide on how to configure > IBM RSA II for power fencing in an RHES 5.1 environment. > > Is it just as simple as this one page instruction below : > http://www.redhat.com/docs/manuals/csgfs/browse/rh-cs-en/s1-config-fence -devices.html > > > Some questions : > > a)how do we get the "Add a New Fence Device" screen? Is it > somewhere on the > Redhat Gnome desktop that I can click to bring it up? No. It's not anywhere in RedHat Gnome. You will have to use Conga OR system-config-cluster OR enter it manually. > > b)the factory default IP addr of the RSA II LAN port is > 192.168.70.125/24 <http://192.168.70.125/24>. > What's the IP addr I can input in the above "Add New Fence Device" > screen - must it be 192.168.70.x (within same subnet as > 192.168.70.125 <http://192.168.70.125>)? It's the IP address assigned to the IPMI port and not the network. > > c)do we repeat the same step("Add New Fence Device") for every RHES > server in the cluster & is the same IP address/login id being input for > each of the servers in the cluster? New fence device is added only once. However, you need to assign this fence device to all the nodes in your cluster. > > The link below only gives a little concept, not actual configuration > guide : > http://www.centos.org/docs/4/4.5/SAC_Cluster_Suite_Overview/s2-fencing-o verview-CSO.html > > > Any other links/information is much appreciated. > > Thanks > > Thanks Gowrishankar Rajaiyan -------------- next part -------------- An HTML attachment was scrubbed... URL: https://www.redhat.com/archives/linux-cluster/attachments/20080625/44ab2 a21/attachment.html ------------------------------ Message: 3 Date: Wed, 25 Jun 2008 10:55:49 +0200 From: "Gian Paolo Buono" <gpbuono@xxxxxxxxx> Subject: Re: can't communicate with fenced -1 To: "linux clustering" <linux-cluster@xxxxxxxxxx> Message-ID: <c60a46e50806250155l6c4eacc2s190d36b0eb34ffb8@xxxxxxxxxxxxxx> Content-Type: text/plain; charset="iso-8859-1" Hi, if I try to restart on yoda2 cman [root@yoda2 ~]# /etc/init.d/cman restart Stopping cluster: Stopping fencing... done Stopping cman... done Stopping ccsd... done Unmounting configfs... done [ OK ] Starting cluster: Enabling workaround for Xend bridged networking... done Loading modules... done Mounting configfs... done Starting ccsd... done Starting cman... done Starting daemons... done Starting fencing... failed [FAILED] [root@yoda2 ~]# tail -f /var/log/messages Jun 25 10:50:42 yoda2 openais[18429]: [CLM ] Members Joined: Jun 25 10:50:42 yoda2 openais[18429]: [CLM ] r(0) ip(172.20.0.174) Jun 25 10:50:42 yoda2 openais[18429]: [SYNC ] This node is within the primary component and will provide service. Jun 25 10:50:42 yoda2 openais[18429]: [TOTEM] entering OPERATIONAL state. Jun 25 10:50:42 yoda2 openais[18429]: [CLM ] got nodejoin message 172.20.0.174 Jun 25 10:50:42 yoda2 openais[18429]: [CLM ] got nodejoin message 172.20.0.175 Jun 25 10:50:42 yoda2 openais[18429]: [CPG ] got joinlist message from node 2 Jun 25 10:50:42 yoda2 openais[18429]: [CMAN ] cman killed by node 1 because we were killed by cman_tool or other application Jun 25 10:50:42 yoda2 ccsd[18421]: Initial status:: Quorate Jun 25 10:50:43 yoda2 gfs_controld[18455]: cman_init error 111 Jun 25 10:51:10 yoda2 ccsd[18421]: Unable to connect to cluster infrastructure after 30 seconds. Jun 25 10:51:37 yoda2 snmpd[4764]: Connection from UDP: [172.20.0.32]:55090 on this server there are 3 xen domu and i can't to reboot yoda2 :( .. best regards.. and sorry for my english :) 2008/6/25 GS R <gsrlinux@xxxxxxxxx>: > >> >> >> 2008/6/25 GS R <gsrlinux@xxxxxxxxx>: >> >>> >>> >>> On 6/24/08, Gian Paolo Buono <gpbuono@xxxxxxxxx> wrote: >>> >>>> Hi, >>>> >>>> We have two RHEL5.1 boxes installed sharing a >>>> single iscsi emc2 SAN, whitout fence devices. System is configured >>>> >>>> >>>> >>>> as a high-availability system of xen guest. >>>> >>>> One of the most repeating problems are fence_tool related. >>>> >>>> # service cman start >>>> Starting cluster: >>>> Loading modules... done >>>> Mounting configfs... done >>>> Starting ccsd... done >>>> Starting cman... done >>>> Starting daemons... done >>>> Starting fencing... fence_tool: can't communicate with fenced -1 >>>> >>>> >>>> >>>> >>>> # fenced -D >>>> 1204556546 cman_init error 0 111 >>>> >>>> # clustat >>>> CMAN is not running. >>>> >>>> # cman_tool join >>>> >>>> # clustat >>>> msg_open: Connection refused >>>> >>>> Member Status: Quorate >>>> Member Name ID Status >>>> >>>> ------ ---- ---- ------ >>>> yoda1 1 Online, Local >>>> yoda2 2 Offline >>>> >>>> Sometimes this problem gets solved if the two machines are rebooted at >>>> >>>> >>>> >>>> >>>> the same time. But in the current HA configuration, I cannot guarantee >>>> two systems will be rebooted at the same time for every problem we >>>> face. This is my config file: >>>> >>>> ###################################cluster.conf######################### ########### >>>> >>>> >>>> >>>> >>>> <?xml version="1.0"?> >>>> <cluster alias="yoda-cl" config_version="2" name="yoda-cl"> >>>> <fence_daemon clean_start="0" post_fail_delay="0" post_join_delay="3"/> >>>> >>>> >>>> >>>> >>>> <clusternodes> >>>> <clusternode name="yoda2" nodeid="1" votes="1"> >>>> <fence/> >>>> </clusternode> >>>> <clusternode name="yoda1" nodeid="2" votes="1"> >>>> >>>> >>>> >>>> >>>> <fence/> >>>> </clusternode> >>>> </clusternodes> >>>> <cman expected_votes="1" two_node="1"/> >>>> <rm> >>>> <failoverdomains/> >>>> >>>> >>>> >>>> >>>> <resources/> >>>> </rm> >>>> <fencedevices/> >>>> </cluster> >>>> ###################################cluster.conf######################### ########### >>>> Regards. >>>> >>>> Hi >>> >>> I configured a two node cluster with no fence device on RHEL5.1. >>> The cluster started and stopped with no issues. The only difference that >>> I see is that I have used FQDN in my cluster.conf >>> >>> i.e., <clusternode name="yoda2*.gsr.com*" nodeid="1" votes="1"> >>> >>> Check your /etc/hosts if it has the FQDN in it. >>> >>> Thanks >>> Gowrishankar Rajaiyan >>> >>> >>> >> > > On 6/25/08, Gian Paolo Buono <gpbuono@xxxxxxxxx> wrote: > >> Hi, >> the problem of my cluster is that it start-up weel but after two days the >> problem that I have described is running, and this problem gets solved if >> the two machines are rebooted at the same time. >> >> Thanks >> Gian Paolo >> > > > Hi Gian > > Could you please attach the logs. > > Thanks > Gowrishankar Rajaiyan > > -- > Linux-cluster mailing list > Linux-cluster@xxxxxxxxxx > https://www.redhat.com/mailman/listinfo/linux-cluster > -------------- next part -------------- An HTML attachment was scrubbed... URL: https://www.redhat.com/archives/linux-cluster/attachments/20080625/03101 f0d/attachment.html ------------------------------ Message: 4 Date: Wed, 25 Jun 2008 11:24:04 +0200 From: "Gian Paolo Buono" <gpbuono@xxxxxxxxx> Subject: Re: can't communicate with fenced -1 To: "linux clustering" <linux-cluster@xxxxxxxxxx> Message-ID: <c60a46e50806250224x5a2ed28bn32a57e0f3baab578@xxxxxxxxxxxxxx> Content-Type: text/plain; charset="iso-8859-1" Hi, an other problem the process clurgmgrd don't dead: [root@yoda2 ~]# /etc/init.d/rgmanager stop Shutting down Cluster Service Manager... Waiting for services to stop: but nothing to do... [root@yoda2 ~]# ps -ef | grep clurgmgrd root 6620 1 55 Jun03 ? 12-02:06:46 clurgmgrd [root@yoda2 ~]# kill -9 6620 [root@yoda2 ~]# ps -ef | grep clurgmgrd and the process clvmd [root@yoda2 ~]# /etc/init.d/clvmd status clvmd dead but subsys locked active volumes: LV06 LV_nex2 help me ... i don't want reboot the yoda2 ... bye On Wed, Jun 25, 2008 at 10:55 AM, Gian Paolo Buono <gpbuono@xxxxxxxxx> wrote: > Hi, > if I try to restart on yoda2 cman > [root@yoda2 ~]# /etc/init.d/cman restart > Stopping cluster: > Stopping fencing... done > Stopping cman... done > Stopping ccsd... done > Unmounting configfs... done > [ OK ] > Starting cluster: > Enabling workaround for Xend bridged networking... done > Loading modules... done > Mounting configfs... done > Starting ccsd... done > Starting cman... done > Starting daemons... done > Starting fencing... failed > > [FAILED] > [root@yoda2 ~]# tail -f /var/log/messages > Jun 25 10:50:42 yoda2 openais[18429]: [CLM ] Members Joined: > Jun 25 10:50:42 yoda2 openais[18429]: [CLM ] r(0) ip(172.20.0.174) > Jun 25 10:50:42 yoda2 openais[18429]: [SYNC ] This node is within the > primary component and will provide service. > Jun 25 10:50:42 yoda2 openais[18429]: [TOTEM] entering OPERATIONAL state. > Jun 25 10:50:42 yoda2 openais[18429]: [CLM ] got nodejoin message > 172.20.0.174 > Jun 25 10:50:42 yoda2 openais[18429]: [CLM ] got nodejoin message > 172.20.0.175 > Jun 25 10:50:42 yoda2 openais[18429]: [CPG ] got joinlist message from > node 2 > Jun 25 10:50:42 yoda2 openais[18429]: [CMAN ] cman killed by node 1 because > we were killed by cman_tool or other application > Jun 25 10:50:42 yoda2 ccsd[18421]: Initial status:: Quorate > Jun 25 10:50:43 yoda2 gfs_controld[18455]: cman_init error 111 > Jun 25 10:51:10 yoda2 ccsd[18421]: Unable to connect to cluster > infrastructure after 30 seconds. > Jun 25 10:51:37 yoda2 snmpd[4764]: Connection from UDP: [172.20.0.32 > ]:55090 > > > on this server there are 3 xen domu and i can't to reboot yoda2 :( .. > > best regards.. and sorry for my english :) > > 2008/6/25 GS R <gsrlinux@xxxxxxxxx>: > >> >>> >>> >>> 2008/6/25 GS R <gsrlinux@xxxxxxxxx>: >>> >>>> >>>> >>>> On 6/24/08, Gian Paolo Buono <gpbuono@xxxxxxxxx> wrote: >>>> >>>>> Hi, >>>>> >>>>> We have two RHEL5.1 boxes installed sharing a >>>>> single iscsi emc2 SAN, whitout fence devices. System is configured >>>>> >>>>> >>>>> >>>>> >>>>> as a high-availability system of xen guest. >>>>> >>>>> One of the most repeating problems are fence_tool related. >>>>> >>>>> # service cman start >>>>> Starting cluster: >>>>> Loading modules... done >>>>> Mounting configfs... done >>>>> Starting ccsd... done >>>>> Starting cman... done >>>>> Starting daemons... done >>>>> Starting fencing... fence_tool: can't communicate with fenced -1 >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> # fenced -D >>>>> 1204556546 cman_init error 0 111 >>>>> >>>>> # clustat >>>>> CMAN is not running. >>>>> >>>>> # cman_tool join >>>>> >>>>> # clustat >>>>> msg_open: Connection refused >>>>> >>>>> Member Status: Quorate >>>>> Member Name ID Status >>>>> >>>>> ------ ---- ---- ------ >>>>> yoda1 1 Online, Local >>>>> yoda2 2 Offline >>>>> >>>>> Sometimes this problem gets solved if the two machines are rebooted at >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> the same time. But in the current HA configuration, I cannot guarantee >>>>> two systems will be rebooted at the same time for every problem we >>>>> face. This is my config file: >>>>> >>>>> ###################################cluster.conf######################### ########### >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> <?xml version="1.0"?> >>>>> <cluster alias="yoda-cl" config_version="2" name="yoda-cl"> >>>>> <fence_daemon clean_start="0" post_fail_delay="0" post_join_delay="3"/> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> <clusternodes> >>>>> <clusternode name="yoda2" nodeid="1" votes="1"> >>>>> <fence/> >>>>> </clusternode> >>>>> <clusternode name="yoda1" nodeid="2" votes="1"> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> <fence/> >>>>> </clusternode> >>>>> </clusternodes> >>>>> <cman expected_votes="1" two_node="1"/> >>>>> <rm> >>>>> <failoverdomains/> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> <resources/> >>>>> </rm> >>>>> <fencedevices/> >>>>> </cluster> >>>>> ###################################cluster.conf######################### ########### >>>>> Regards. >>>>> >>>>> Hi >>>> >>>> I configured a two node cluster with no fence device on RHEL5.1. >>>> The cluster started and stopped with no issues. The only difference that >>>> I see is that I have used FQDN in my cluster.conf >>>> >>>> i.e., <clusternode name="yoda2*.gsr.com*" nodeid="1" votes="1"> >>>> >>>> Check your /etc/hosts if it has the FQDN in it. >>>> >>>> Thanks >>>> Gowrishankar Rajaiyan >>>> >>>> >>>> >>> >> >> On 6/25/08, Gian Paolo Buono <gpbuono@xxxxxxxxx> wrote: >> >>> Hi, >>> the problem of my cluster is that it start-up weel but after two days the >>> problem that I have described is running, and this problem gets solved if >>> the two machines are rebooted at the same time. >>> >>> Thanks >>> Gian Paolo >>> >> >> >> Hi Gian >> >> Could you please attach the logs. >> >> Thanks >> Gowrishankar Rajaiyan >> >> -- >> Linux-cluster mailing list >> Linux-cluster@xxxxxxxxxx >> https://www.redhat.com/mailman/listinfo/linux-cluster >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: https://www.redhat.com/archives/linux-cluster/attachments/20080625/b39eb e5e/attachment.html ------------------------------ Message: 5 Date: Wed, 25 Jun 2008 15:26:45 +0530 From: GS R <gsrlinux@xxxxxxxxx> Subject: Re: can't communicate with fenced -1 To: linux clustering <linux-cluster@xxxxxxxxxx> Message-ID: <4862165D.3030105@xxxxxxxxx> Content-Type: text/plain; charset="iso-8859-1" Hi Gian, I too faced the same issue. Rebooting the system is indeed the easy solution here. I could do it because it was my test setup. 1. Have you added any resources to this cluster? 2. Have you configured any services to this cluster? 3. Have you tried using a fence device. i.e., fence_manual? 4. Is there at times a heavy load on your network? 5. Have you opened all the ports on your firewall? Thanks Gowrishankar Rajaiyan Gian Paolo Buono wrote: > Hi, > an other problem the process clurgmgrd don't dead: > > [root@yoda2 ~]# /etc/init.d/rgmanager stop > Shutting down Cluster Service Manager... > Waiting for services to stop: > > but nothing to do... > > [root@yoda2 ~]# ps -ef | grep clurgmgrd > root 6620 1 55 Jun03 ? 12-02:06:46 clurgmgrd > [root@yoda2 ~]# kill -9 6620 > [root@yoda2 ~]# ps -ef | grep clurgmgrd > > and the process clvmd > > [root@yoda2 ~]# /etc/init.d/clvmd status > clvmd dead but subsys locked > active volumes: LV06 LV_nex2 > > help me ... i don't want reboot the yoda2 ... > > bye > > > On Wed, Jun 25, 2008 at 10:55 AM, Gian Paolo Buono <gpbuono@xxxxxxxxx > <mailto:gpbuono@xxxxxxxxx>> wrote: > > Hi, > if I try to restart on yoda2 cman > [root@yoda2 ~]# /etc/init.d/cman restart > Stopping cluster: > Stopping fencing... done > Stopping cman... done > Stopping ccsd... done > Unmounting configfs... done > [ OK ] > Starting cluster: > Enabling workaround for Xend bridged networking... done > > Loading modules... done > Mounting configfs... done > Starting ccsd... done > Starting cman... done > Starting daemons... done > Starting fencing... failed > > [FAILED] > [root@yoda2 ~]# tail -f /var/log/messages > Jun 25 10:50:42 yoda2 openais[18429]: [CLM ] Members Joined: > Jun 25 10:50:42 yoda2 openais[18429]: [CLM ] r(0) > ip(172.20.0.174 <http://172.20.0.174>) > Jun 25 10:50:42 yoda2 openais[18429]: [SYNC ] This node is within > the primary component and will provide service. > Jun 25 10:50:42 yoda2 openais[18429]: [TOTEM] entering OPERATIONAL > state. > Jun 25 10:50:42 yoda2 openais[18429]: [CLM ] got nodejoin message > 172.20.0.174 <http://172.20.0.174> > Jun 25 10:50:42 yoda2 openais[18429]: [CLM ] got nodejoin message > 172.20.0.175 <http://172.20.0.175> > Jun 25 10:50:42 yoda2 openais[18429]: [CPG ] got joinlist message > from node 2 > Jun 25 10:50:42 yoda2 openais[18429]: [CMAN ] cman killed by node > 1 because we were killed by cman_tool or other application > Jun 25 10:50:42 yoda2 ccsd[18421]: Initial status:: Quorate > Jun 25 10:50:43 yoda2 gfs_controld[18455]: cman_init error 111 > Jun 25 10:51:10 yoda2 ccsd[18421]: Unable to connect to cluster > infrastructure after 30 seconds. > Jun 25 10:51:37 yoda2 snmpd[4764]: Connection from UDP: > [172.20.0.32 <http://172.20.0.32>]:55090 > > > on this server there are 3 xen domu and i can't to reboot yoda2 :( .. > > best regards.. and sorry for my english :) > -------------- next part -------------- An HTML attachment was scrubbed... URL: https://www.redhat.com/archives/linux-cluster/attachments/20080625/0f4ff 35e/attachment.html ------------------------------ -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster End of Linux-cluster Digest, Vol 50, Issue 32 ********************************************* -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster