Re: openais issue

Daniela Anzellotti <daniela.anzellotti@xxxxxxxxxxxxx> · Mon, 05 Oct 2009 14:19:17 +0200

Hi all,

I had a problem similar to Paras's one today: yum updated the following 
rpms last week and today (I had to restart the cluster) the cluster was 
not able to start vm: services.

Oct 02 05:31:05 Updated: openais-0.80.6-8.el5.x86_64
Oct 02 05:31:07 Updated: cman-2.0.115-1.el5.x86_64
Oct 02 05:31:10 Updated: rgmanager-2.0.52-1.el5.x86_64

Oct 03 04:03:12 Updated: xen-libs-3.0.3-94.el5_4.1.x86_64
Oct 03 04:03:12 Updated: xen-libs-3.0.3-94.el5_4.1.i386
Oct 03 04:03:16 Updated: xen-3.0.3-94.el5_4.1.x86_64

So, after checked the vm.sh script, I add the declaration use_virsh="0" 
in the VM definition in the cluster.conf (as suggested by Brem, thanks!) 
and everything is now working again.

BTW I didn't understand if the problem was caused by the new XEN version 
or the new openais one, thus I disabled automatic updates for both.

I hope I'll not have any other bad surprise...

Thank you,
cheers,
Daniela

Paras pradhan wrote:
Yes this is very strange. I don't know what to do now. May be re
create the cluster? But not a good solution actually.

Packages :

Kernel: kernel-xen-2.6.18-164.el5
OS: Full updated of CentOS 5.3 except CMAN downgraded to cman-2.0.98-1.el5

Other packages related to cluster suite:

rgmanager-2.0.52-1.el5.centos
cman-2.0.98-1.el5
xen-3.0.3-80.el5_3.3
xen-libs-3.0.3-80.el5_3.3
kmod-gfs-xen-0.1.31-3.el5_3.1
kmod-gfs-xen-0.1.31-3.el5_3.1
kmod-gfs-0.1.31-3.el5_3.1
gfs-utils-0.1.18-1.el5
gfs2-utils-0.1.62-1.el5
lvm2-2.02.40-6.el5
lvm2-cluster-2.02.40-7.el5
openais-0.80.3-22.el5_3.9

Thanks!
Paras.

On Wed, Sep 30, 2009 at 10:02 AM, brem belguebli
<brem.belguebli@xxxxxxxxx> wrote:
Hi Paras,

Your cluster.conf file seems correct. If it is not a ntp issue, I
don't see anything except a bug that causes this, or some prerequisite
that is not respected.

May be you could post the versions (os, kernel, packages etc...) you
are using, someone may have hit the same issue with your versions.

Brem

2009/9/30, Paras pradhan <pradhanparas@xxxxxxxxx>:
All of the nodes are synced with ntp server. So this is not the case with me.

Thanks
Paras.

On Tue, Sep 29, 2009 at 6:29 PM, Johannes Rußek
<johannes.russek@xxxxxxxxxxxxxxxxx> wrote:
make sure the time on the nodes is in sync, apparently when a node has too
much offset, you won't see rgmanager (even though the process is running).
this happened today and setting the time fixed it for me. afaicr there was
no sign of this in the logs though.
johannes

Paras pradhan schrieb:
I don't see rgmanager .

Here is the o/p from clustat

[root@cvtst1 cluster]# clustat
Cluster Status for test @ Tue Sep 29 15:53:33 2009
Member Status: Quorate

 Member Name                                                     ID
Status
 ------ ----                                                     ----
------
 cvtst2                                                    1 Online
 cvtst1                                                     2 Online,
Local
 cvtst3                                                     3 Online

Thanks
Paras.

On Tue, Sep 29, 2009 at 3:44 PM, brem belguebli
<brem.belguebli@xxxxxxxxx> wrote:

It looks correct, rgmanager seems to start on all nodes

what gives you clustat ?

If rgmanager doesn't show, check out the logs something may have gone
wrong.

2009/9/29 Paras pradhan <pradhanparas@xxxxxxxxx>:

Change to 7 and i got this log

Sep 29 15:33:50 cvtst1 rgmanager: [23295]: <notice> Shutting down
Cluster Service Manager...
Sep 29 15:33:50 cvtst1 clurgmgrd[22869]: <notice> Shutting down
Sep 29 15:33:50 cvtst1 clurgmgrd[22869]: <notice> Shutting down
Sep 29 15:33:50 cvtst1 clurgmgrd[22869]: <notice> Shutdown complete,
exiting
Sep 29 15:33:50 cvtst1 rgmanager: [23295]: <notice> Cluster Service
Manager is stopped.
Sep 29 15:33:51 cvtst1 clurgmgrd[23324]: <notice> Resource Group
Manager Starting
Sep 29 15:33:51 cvtst1 clurgmgrd[23324]: <info> Loading Service Data
Sep 29 15:33:51 cvtst1 clurgmgrd[23324]: <debug> Loading Resource Rules
Sep 29 15:33:52 cvtst1 clurgmgrd[23324]: <debug> 21 rules loaded
Sep 29 15:33:52 cvtst1 clurgmgrd[23324]: <debug> Building Resource Trees
Sep 29 15:33:52 cvtst1 clurgmgrd[23324]: <debug> 0 resources defined
Sep 29 15:33:52 cvtst1 clurgmgrd[23324]: <debug> Loading Failover
Domains
Sep 29 15:33:52 cvtst1 clurgmgrd[23324]: <debug> 1 domains defined
Sep 29 15:33:52 cvtst1 clurgmgrd[23324]: <debug> 1 events defined
Sep 29 15:33:52 cvtst1 clurgmgrd[23324]: <info> Initializing Services
Sep 29 15:33:52 cvtst1 clurgmgrd[23324]: <info> Services Initialized
Sep 29 15:33:52 cvtst1 clurgmgrd[23324]: <debug> Event: Port Opened
Sep 29 15:33:52 cvtst1 clurgmgrd[23324]: <info> State change: Local UP
Sep 29 15:33:52 cvtst1 clurgmgrd[23324]: <info> State change: cvtst2 UP
Sep 29 15:33:52 cvtst1 clurgmgrd[23324]: <info> State change: cvtst3 UP
Sep 29 15:33:57 cvtst1 clurgmgrd[23324]: <debug> Event (1:2:1) Processed
Sep 29 15:33:57 cvtst1 clurgmgrd[23324]: <debug> Event (0:1:1) Processed
Sep 29 15:33:57 cvtst1 clurgmgrd[23324]: <debug> Event (0:3:1) Processed
Sep 29 15:34:02 cvtst1 clurgmgrd[23324]: <debug> 3 events processed

Anything unusual here?

Paras.

On Tue, Sep 29, 2009 at 11:51 AM, brem belguebli
<brem.belguebli@xxxxxxxxx> wrote:

I use log_level=7 to have more debugging info.

It seems 4 is not enough.

Brem

2009/9/29, Paras pradhan <pradhanparas@xxxxxxxxx>:

Withe log_level of 3 I got only this

Sep 29 10:31:31 cvtst1 rgmanager: [7170]: <notice> Shutting down
Cluster Service Manager...
Sep 29 10:31:31 cvtst1 clurgmgrd[6673]: <notice> Shutting down
Sep 29 10:31:41 cvtst1 clurgmgrd[6673]: <notice> Shutdown complete,
exiting
Sep 29 10:31:41 cvtst1 rgmanager: [7170]: <notice> Cluster Service
Manager is stopped.
Sep 29 10:31:42 cvtst1 clurgmgrd[7224]: <notice> Resource Group
Manager Starting
Sep 29 10:39:06 cvtst1 rgmanager: [10327]: <notice> Shutting down
Cluster Service Manager...
Sep 29 10:39:16 cvtst1 rgmanager: [10327]: <notice> Cluster Service
Manager is stopped.
Sep 29 10:39:16 cvtst1 clurgmgrd[10380]: <notice> Resource Group
Manager Starting
Sep 29 10:39:52 cvtst1 clurgmgrd[10380]: <notice> Member 1 shutting
down

I do not know what the last line means.

rgmanager version I am running is:
rgmanager-2.0.52-1.el5.centos

I don't what has gone wrong.

Thanks
Paras.

On Mon, Sep 28, 2009 at 6:41 PM, brem belguebli
<brem.belguebli@xxxxxxxxx> wrote:

you mean it stopped successfully on all the nodes but it is failing
to
start only on node cvtst1 ?

look at the following page  to make rgmanager more verbose. It 'll
help debug....

http://sources.redhat.com/cluster/wiki/RGManager

at Logging Configuration section

2009/9/29 Paras pradhan <pradhanparas@xxxxxxxxx>:

Brem,

When I try to restart rgmanager on all the nodes, this time i do not
see rgmanager running on the first node. But I do see on other 2
nodes.

Log on the first node:

Sep 28 18:13:58 cvtst1 clurgmgrd[24099]: <notice> Resource Group
Manager Starting
Sep 28 18:17:29 cvtst1 rgmanager: [24627]: <notice> Shutting down
Cluster Service Manager...
Sep 28 18:17:29 cvtst1 clurgmgrd[24099]: <notice> Shutting down
Sep 28 18:17:39 cvtst1 clurgmgrd[24099]: <notice> Shutdown complete,
exiting
Sep 28 18:17:39 cvtst1 rgmanager: [24627]: <notice> Cluster Service
Manager is stopped.
Sep 28 18:17:40 cvtst1 clurgmgrd[24679]: <notice> Resource Group
Manager Starting

-
It seems service is running ,  but I do not see rgmanger running
using clustat

Don't know what is going on.

Thanks
Paras.

On Mon, Sep 28, 2009 at 5:46 PM, brem belguebli
<brem.belguebli@xxxxxxxxx> wrote:

Paras,

Another thing, it would have been more interesting to have a start
DEBUG not a stop.

That's why I was asking you to first stop the vm manually on all
your
nodes, stop eventually rgmanager on all the nodes to reset the
potential wrong states you may have, restart rgmanager.

If your VM is configured to autostart, this will make it start.

It should normally fail (as it does now). Send out your newly
created
DEBUG file.

2009/9/29 brem belguebli <brem.belguebli@xxxxxxxxx>:

Hi Paras,

I don't know the xen/cluster combination well, but if I do
remember
well, I think I've read somewhere that when using xen you have to
declare the use_virsh=0 key in the VM definition in the
cluster.conf.

This would make rgmanager use xm commands instead of virsh
The DEBUG output shows clearly that you are using virsh to manage
your
VM instead of xm commands.
Check out the RH docs about virtualization

I'm not a 100% sure about that, I may be completely wrong.

Brem

2009/9/28 Paras pradhan <pradhanparas@xxxxxxxxx>:

The only thing I noticed is the message after stopping the vm
using xm
in all nodes and starting using clusvcadm is

"Virtual machine guest1 is blocked"

The whole DEBUG file is attached.

Thanks
Paras.

On Fri, Sep 25, 2009 at 5:53 PM, brem belguebli
<brem.belguebli@xxxxxxxxx> wrote:

There's a problem with the script that is called by rgmanager to
start
the VM, I don't know what causes it

May be you should try something like :

1) stop the VM on all nodes with xm commands
2) edit the /usr/share/cluster/vm.sh script and add the
following
lines (after the #!/bin/bash ):
 exec >/tmp/DEBUG 2>&1
 set -x
3) start the VM with clusvcadm -e vm:guest1

It should fail as it did before.

edit the the /tmp/DEBUG file and you will be able to see where
it
fails (it may generate a lot of debug)

4) remove the debug lines from /usr/share/cluster/vm.sh

Post the DEBUG file if you're not able to see where it fails.

Brem

2009/9/26 Paras pradhan <pradhanparas@xxxxxxxxx>:

No I am not manually starting not using automatic init scripts.

I started the vm using: clusvcadm -e vm:guest1

I have just stopped using clusvcadm -s vm:guest1. For few
seconds it
says guest1 started . But after a while I can see the guest1 on
all
three nodes.

clustat says:

 Service Name                                            Owner
(Last)
                                        State
 ------- ----                                            -----
------
                                        -----
 vm:guest1                                               (none)
                                        stopped

But I can see the vm from xm li.

This is what I can see from the log:

Sep 25 17:19:01 cvtst1 clurgmgrd[4298]: <notice> start on vm
"guest1"
returned 1 (generic error)
Sep 25 17:19:01 cvtst1 clurgmgrd[4298]: <warning> #68: Failed
to start
vm:guest1; return value: 1
Sep 25 17:19:01 cvtst1 clurgmgrd[4298]: <notice> Stopping
service vm:guest1
Sep 25 17:19:02 cvtst1 clurgmgrd[4298]: <notice> Service
vm:guest1 is
recovering
Sep 25 17:19:15 cvtst1 clurgmgrd[4298]: <notice> Recovering
failed
service vm:guest1
Sep 25 17:19:16 cvtst1 clurgmgrd[4298]: <notice> start on vm
"guest1"
returned 1 (generic error)
Sep 25 17:19:16 cvtst1 clurgmgrd[4298]: <warning> #68: Failed
to start
vm:guest1; return value: 1
Sep 25 17:19:16 cvtst1 clurgmgrd[4298]: <notice> Stopping
service vm:guest1
Sep 25 17:19:17 cvtst1 clurgmgrd[4298]: <notice> Service
vm:guest1 is
recovering

Paras.

On Fri, Sep 25, 2009 at 5:07 PM, brem belguebli
<brem.belguebli@xxxxxxxxx> wrote:

Have you started  your VM via rgmanager (clusvcadm -e
vm:guest1) or
using xm commands out of cluster control  (or maybe a thru an
automatic init script ?)

When clustered, you should never be starting services
(manually or
thru automatic init script) out of cluster control

The thing would be to stop your vm on all the nodes with the
adequate
xm command (not using xen myself) and try to start it with
clusvcadm.

Then see if it is started on all nodes (send clustat output)

2009/9/25 Paras pradhan <pradhanparas@xxxxxxxxx>:

Ok. Please see below. my vm is running on all nodes though
clustat
says it is stopped.

--
[root@cvtst1 ~]# clustat
Cluster Status for test @ Fri Sep 25 16:52:34 2009
Member Status: Quorate

 Member Name
    ID   Status
 ------ ----
    ---- ------
 cvtst2                                                    1
Online, rgmanager
 cvtst1                                                     2
Online,
Local, rgmanager
 cvtst3                                                     3
Online, rgmanager

 Service Name
 Owner (Last)
                                        State
 ------- ----
 ----- ------
                                        -----
 vm:guest1
(none)
                                        stopped
[root@cvtst1 ~]#

---
o/p of xm li on cvtst1

--
[root@cvtst1 ~]# xm li
Name                                      ID Mem(MiB) VCPUs
State   Time(s)
Domain-0                                   0     3470     2
r-----  28939.4
guest1                                     7      511     1
-b----   7727.8

o/p of xm li on cvtst2

--
[root@cvtst2 ~]# xm li
Name                                      ID Mem(MiB) VCPUs
State   Time(s)
Domain-0                                   0     3470     2
r-----  31558.9
guest1                                    21      511     1
-b----   7558.2
---

Thanks
Paras.

On Fri, Sep 25, 2009 at 4:22 PM, brem belguebli
<brem.belguebli@xxxxxxxxx> wrote:

It looks like no.

can you send an output of clustat  of when the VM is running
on
multiple nodes at the same time?

And by the way, another one after having stopped (clusvcadm
-s vm:guest1) ?

2009/9/25 Paras pradhan <pradhanparas@xxxxxxxxx>:

Anyone having issue as mine? Virtual machine service is not
being
properly handled by the cluster.

Thanks
Paras.

On Mon, Sep 21, 2009 at 9:55 AM, Paras pradhan
<pradhanparas@xxxxxxxxx> wrote:

Ok.. here is my cluster.conf file

--
[root@cvtst1 cluster]# more cluster.conf
<?xml version="1.0"?>
<cluster alias="test" config_version="9" name="test">
      <fence_daemon clean_start="0" post_fail_delay="0"
post_join_delay="3"/>
      <clusternodes>
              <clusternode name="cvtst2" nodeid="1"
votes="1">
                      <fence/>
              </clusternode>
              <clusternode name="cvtst1" nodeid="2"
votes="1">
                      <fence/>
              </clusternode>
              <clusternode name="cvtst3" nodeid="3"
votes="1">
                      <fence/>
              </clusternode>
      </clusternodes>
      <cman/>
      <fencedevices/>
      <rm>
              <failoverdomains>
                      <failoverdomain name="myfd1"
nofailback="0" ordered="1" restricted="0">
                              <failoverdomainnode
name="cvtst2" priority="3"/>
                              <failoverdomainnode
name="cvtst1" priority="1"/>
                              <failoverdomainnode
name="cvtst3" priority="2"/>
                      </failoverdomain>
              </failoverdomains>
              <resources/>
              <vm autostart="1" domain="myfd1"
exclusive="0" max_restarts="0"
name="guest1" path="/vms" recovery="r
estart" restart_expire_time="0"/>
      </rm>
</cluster>
[root@cvtst1 cluster]#
------

Thanks!
Paras.

On Sun, Sep 20, 2009 at 9:44 AM, Volker Dormeyer
<volker@xxxxxxxxxxxx> wrote:

On Fri, Sep 18, 2009 at 05:08:57PM -0500,
Paras pradhan <pradhanparas@xxxxxxxxx> wrote:

I am using cluster suite for HA of xen virtual machines.
Now I am
having another problem. When I start the my xen vm in
one node, it
also starts on other nodes. Which daemon controls  this?

This is usually done bei clurgmgrd (which is part of the
rgmanager
package). To me, this sounds like a configuration
problem. Maybe,
you can post your cluster.conf?

Regards,
Volker

--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster

--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster

--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster

--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster

--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster

--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster

--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster

--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster

--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster

--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster

--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster

--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster

--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster

--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster

--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster

--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster

--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster

--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster

--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster

--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster

--
- Daniela Anzellotti ------------------------------------
 INFN Roma - tel.: +39.06.49914282 - fax: +39.06.490354
 e-mail: daniela.anzellotti@xxxxxxxxxxxxx
---------------------------------------------------------

--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster