Re: openais issue

Johannes Rußek <johannes.russek@xxxxxxxxxxxxxxxxx> · Wed, 30 Sep 2009 01:29:51 +0200

make sure the time on the nodes is in sync, apparently when a node has 
too much offset, you won't see rgmanager (even though the process is 
running).
this happened today and setting the time fixed it for me. afaicr there 
was no sign of this in the logs though.
johannes

Paras pradhan schrieb:
I don't see rgmanager .

Here is the o/p from clustat

[root@cvtst1 cluster]# clustat
Cluster Status for test @ Tue Sep 29 15:53:33 2009
Member Status: Quorate

 Member Name                                                     ID   Status
 ------ ----                                                     ---- ------
 cvtst2                                                    1 Online
 cvtst1                                                     2 Online, Local
 cvtst3                                                     3 Online

Thanks
Paras.

On Tue, Sep 29, 2009 at 3:44 PM, brem belguebli
<brem.belguebli@xxxxxxxxx> wrote:

It looks correct, rgmanager seems to start on all nodes

what gives you clustat ?

If rgmanager doesn't show, check out the logs something may have gone wrong.

2009/9/29 Paras pradhan <pradhanparas@xxxxxxxxx>:

Change to 7 and i got this log

Sep 29 15:33:50 cvtst1 rgmanager: [23295]: <notice> Shutting down
Cluster Service Manager...
Sep 29 15:33:50 cvtst1 clurgmgrd[22869]: <notice> Shutting down
Sep 29 15:33:50 cvtst1 clurgmgrd[22869]: <notice> Shutting down
Sep 29 15:33:50 cvtst1 clurgmgrd[22869]: <notice> Shutdown complete, exiting
Sep 29 15:33:50 cvtst1 rgmanager: [23295]: <notice> Cluster Service
Manager is stopped.
Sep 29 15:33:51 cvtst1 clurgmgrd[23324]: <notice> Resource Group
Manager Starting
Sep 29 15:33:51 cvtst1 clurgmgrd[23324]: <info> Loading Service Data
Sep 29 15:33:51 cvtst1 clurgmgrd[23324]: <debug> Loading Resource Rules
Sep 29 15:33:52 cvtst1 clurgmgrd[23324]: <debug> 21 rules loaded
Sep 29 15:33:52 cvtst1 clurgmgrd[23324]: <debug> Building Resource Trees
Sep 29 15:33:52 cvtst1 clurgmgrd[23324]: <debug> 0 resources defined
Sep 29 15:33:52 cvtst1 clurgmgrd[23324]: <debug> Loading Failover Domains
Sep 29 15:33:52 cvtst1 clurgmgrd[23324]: <debug> 1 domains defined
Sep 29 15:33:52 cvtst1 clurgmgrd[23324]: <debug> 1 events defined
Sep 29 15:33:52 cvtst1 clurgmgrd[23324]: <info> Initializing Services
Sep 29 15:33:52 cvtst1 clurgmgrd[23324]: <info> Services Initialized
Sep 29 15:33:52 cvtst1 clurgmgrd[23324]: <debug> Event: Port Opened
Sep 29 15:33:52 cvtst1 clurgmgrd[23324]: <info> State change: Local UP
Sep 29 15:33:52 cvtst1 clurgmgrd[23324]: <info> State change: cvtst2 UP
Sep 29 15:33:52 cvtst1 clurgmgrd[23324]: <info> State change: cvtst3 UP
Sep 29 15:33:57 cvtst1 clurgmgrd[23324]: <debug> Event (1:2:1) Processed
Sep 29 15:33:57 cvtst1 clurgmgrd[23324]: <debug> Event (0:1:1) Processed
Sep 29 15:33:57 cvtst1 clurgmgrd[23324]: <debug> Event (0:3:1) Processed
Sep 29 15:34:02 cvtst1 clurgmgrd[23324]: <debug> 3 events processed

Anything unusual here?

Paras.

On Tue, Sep 29, 2009 at 11:51 AM, brem belguebli
<brem.belguebli@xxxxxxxxx> wrote:

I use log_level=7 to have more debugging info.

It seems 4 is not enough.

Brem

2009/9/29, Paras pradhan <pradhanparas@xxxxxxxxx>:

Withe log_level of 3 I got only this

Sep 29 10:31:31 cvtst1 rgmanager: [7170]: <notice> Shutting down
Cluster Service Manager...
Sep 29 10:31:31 cvtst1 clurgmgrd[6673]: <notice> Shutting down
Sep 29 10:31:41 cvtst1 clurgmgrd[6673]: <notice> Shutdown complete, exiting
Sep 29 10:31:41 cvtst1 rgmanager: [7170]: <notice> Cluster Service
Manager is stopped.
Sep 29 10:31:42 cvtst1 clurgmgrd[7224]: <notice> Resource Group
Manager Starting
Sep 29 10:39:06 cvtst1 rgmanager: [10327]: <notice> Shutting down
Cluster Service Manager...
Sep 29 10:39:16 cvtst1 rgmanager: [10327]: <notice> Cluster Service
Manager is stopped.
Sep 29 10:39:16 cvtst1 clurgmgrd[10380]: <notice> Resource Group
Manager Starting
Sep 29 10:39:52 cvtst1 clurgmgrd[10380]: <notice> Member 1 shutting down

I do not know what the last line means.

rgmanager version I am running is:
rgmanager-2.0.52-1.el5.centos

I don't what has gone wrong.

Thanks
Paras.

On Mon, Sep 28, 2009 at 6:41 PM, brem belguebli
<brem.belguebli@xxxxxxxxx> wrote:

you mean it stopped successfully on all the nodes but it is failing to
start only on node cvtst1 ?

look at the following page  to make rgmanager more verbose. It 'll
help debug....

http://sources.redhat.com/cluster/wiki/RGManager

at Logging Configuration section

2009/9/29 Paras pradhan <pradhanparas@xxxxxxxxx>:

Brem,

When I try to restart rgmanager on all the nodes, this time i do not
see rgmanager running on the first node. But I do see on other 2
nodes.

Log on the first node:

Sep 28 18:13:58 cvtst1 clurgmgrd[24099]: <notice> Resource Group
Manager Starting
Sep 28 18:17:29 cvtst1 rgmanager: [24627]: <notice> Shutting down
Cluster Service Manager...
Sep 28 18:17:29 cvtst1 clurgmgrd[24099]: <notice> Shutting down
Sep 28 18:17:39 cvtst1 clurgmgrd[24099]: <notice> Shutdown complete, exiting
Sep 28 18:17:39 cvtst1 rgmanager: [24627]: <notice> Cluster Service
Manager is stopped.
Sep 28 18:17:40 cvtst1 clurgmgrd[24679]: <notice> Resource Group
Manager Starting

-
It seems service is running ,  but I do not see rgmanger running using clustat

Don't know what is going on.

Thanks
Paras.

On Mon, Sep 28, 2009 at 5:46 PM, brem belguebli
<brem.belguebli@xxxxxxxxx> wrote:

Paras,

Another thing, it would have been more interesting to have a start
DEBUG not a stop.

That's why I was asking you to first stop the vm manually on all your
nodes, stop eventually rgmanager on all the nodes to reset the
potential wrong states you may have, restart rgmanager.

If your VM is configured to autostart, this will make it start.

It should normally fail (as it does now). Send out your newly created
DEBUG file.

2009/9/29 brem belguebli <brem.belguebli@xxxxxxxxx>:

Hi Paras,

I don't know the xen/cluster combination well, but if I do remember
well, I think I've read somewhere that when using xen you have to
declare the use_virsh=0 key in the VM definition in the cluster.conf.

This would make rgmanager use xm commands instead of virsh
The DEBUG output shows clearly that you are using virsh to manage your
VM instead of xm commands.
Check out the RH docs about virtualization

I'm not a 100% sure about that, I may be completely wrong.

Brem

2009/9/28 Paras pradhan <pradhanparas@xxxxxxxxx>:

The only thing I noticed is the message after stopping the vm using xm
in all nodes and starting using clusvcadm is

"Virtual machine guest1 is blocked"

The whole DEBUG file is attached.

Thanks
Paras.

On Fri, Sep 25, 2009 at 5:53 PM, brem belguebli
<brem.belguebli@xxxxxxxxx> wrote:

There's a problem with the script that is called by rgmanager to start
the VM, I don't know what causes it

May be you should try something like :

1) stop the VM on all nodes with xm commands
2) edit the /usr/share/cluster/vm.sh script and add the following
lines (after the #!/bin/bash ):
  exec >/tmp/DEBUG 2>&1
  set -x
3) start the VM with clusvcadm -e vm:guest1

It should fail as it did before.

edit the the /tmp/DEBUG file and you will be able to see where it
fails (it may generate a lot of debug)

4) remove the debug lines from /usr/share/cluster/vm.sh

Post the DEBUG file if you're not able to see where it fails.

Brem

2009/9/26 Paras pradhan <pradhanparas@xxxxxxxxx>:

No I am not manually starting not using automatic init scripts.

I started the vm using: clusvcadm -e vm:guest1

I have just stopped using clusvcadm -s vm:guest1. For few seconds it
says guest1 started . But after a while I can see the guest1 on all
three nodes.

clustat says:

 Service Name                                            Owner (Last)
                                         State
 ------- ----                                            ----- ------
                                         -----
 vm:guest1                                               (none)
                                         stopped

But I can see the vm from xm li.

This is what I can see from the log:

Sep 25 17:19:01 cvtst1 clurgmgrd[4298]: <notice> start on vm "guest1"
returned 1 (generic error)
Sep 25 17:19:01 cvtst1 clurgmgrd[4298]: <warning> #68: Failed to start
vm:guest1; return value: 1
Sep 25 17:19:01 cvtst1 clurgmgrd[4298]: <notice> Stopping service vm:guest1
Sep 25 17:19:02 cvtst1 clurgmgrd[4298]: <notice> Service vm:guest1 is
recovering
Sep 25 17:19:15 cvtst1 clurgmgrd[4298]: <notice> Recovering failed
service vm:guest1
Sep 25 17:19:16 cvtst1 clurgmgrd[4298]: <notice> start on vm "guest1"
returned 1 (generic error)
Sep 25 17:19:16 cvtst1 clurgmgrd[4298]: <warning> #68: Failed to start
vm:guest1; return value: 1
Sep 25 17:19:16 cvtst1 clurgmgrd[4298]: <notice> Stopping service vm:guest1
Sep 25 17:19:17 cvtst1 clurgmgrd[4298]: <notice> Service vm:guest1 is
recovering

Paras.

On Fri, Sep 25, 2009 at 5:07 PM, brem belguebli
<brem.belguebli@xxxxxxxxx> wrote:

Have you started  your VM via rgmanager (clusvcadm -e vm:guest1) or
using xm commands out of cluster control  (or maybe a thru an
automatic init script ?)

When clustered, you should never be starting services (manually or
thru automatic init script) out of cluster control

The thing would be to stop your vm on all the nodes with the adequate
xm command (not using xen myself) and try to start it with clusvcadm.

Then see if it is started on all nodes (send clustat output)

2009/9/25 Paras pradhan <pradhanparas@xxxxxxxxx>:

Ok. Please see below. my vm is running on all nodes though clustat
says it is stopped.

--
[root@cvtst1 ~]# clustat
Cluster Status for test @ Fri Sep 25 16:52:34 2009
Member Status: Quorate

 Member Name                                                     ID   Status
 ------ ----                                                     ---- ------
 cvtst2                                                    1 Online, rgmanager
 cvtst1                                                     2 Online,
Local, rgmanager
 cvtst3                                                     3 Online, rgmanager

 Service Name                                            Owner (Last)
                                         State
 ------- ----                                            ----- ------
                                         -----
 vm:guest1                                               (none)
                                         stopped
[root@cvtst1 ~]#

---
o/p of xm li on cvtst1

--
[root@cvtst1 ~]# xm li
Name                                      ID Mem(MiB) VCPUs State   Time(s)
Domain-0                                   0     3470     2 r-----  28939.4
guest1                                     7      511     1 -b----   7727.8

o/p of xm li on cvtst2

--
[root@cvtst2 ~]# xm li
Name                                      ID Mem(MiB) VCPUs State   Time(s)
Domain-0                                   0     3470     2 r-----  31558.9
guest1                                    21      511     1 -b----   7558.2
---

Thanks
Paras.

On Fri, Sep 25, 2009 at 4:22 PM, brem belguebli
<brem.belguebli@xxxxxxxxx> wrote:

It looks like no.

can you send an output of clustat  of when the VM is running on
multiple nodes at the same time?

And by the way, another one after having stopped (clusvcadm -s vm:guest1) ?

2009/9/25 Paras pradhan <pradhanparas@xxxxxxxxx>:

Anyone having issue as mine? Virtual machine service is not being
properly handled by the cluster.

Thanks
Paras.

On Mon, Sep 21, 2009 at 9:55 AM, Paras pradhan <pradhanparas@xxxxxxxxx> wrote:

Ok.. here is my cluster.conf file

--
[root@cvtst1 cluster]# more cluster.conf
<?xml version="1.0"?>
<cluster alias="test" config_version="9" name="test">
       <fence_daemon clean_start="0" post_fail_delay="0" post_join_delay="3"/>
       <clusternodes>
               <clusternode name="cvtst2" nodeid="1" votes="1">
                       <fence/>
               </clusternode>
               <clusternode name="cvtst1" nodeid="2" votes="1">
                       <fence/>
               </clusternode>
               <clusternode name="cvtst3" nodeid="3" votes="1">
                       <fence/>
               </clusternode>
       </clusternodes>
       <cman/>
       <fencedevices/>
       <rm>
               <failoverdomains>
                       <failoverdomain name="myfd1" nofailback="0" ordered="1" restricted="0">
                               <failoverdomainnode name="cvtst2" priority="3"/>
                               <failoverdomainnode name="cvtst1" priority="1"/>
                               <failoverdomainnode name="cvtst3" priority="2"/>
                       </failoverdomain>
               </failoverdomains>
               <resources/>
               <vm autostart="1" domain="myfd1" exclusive="0" max_restarts="0"
name="guest1" path="/vms" recovery="r
estart" restart_expire_time="0"/>
       </rm>
</cluster>
[root@cvtst1 cluster]#
------

Thanks!
Paras.

On Sun, Sep 20, 2009 at 9:44 AM, Volker Dormeyer <volker@xxxxxxxxxxxx> wrote:

On Fri, Sep 18, 2009 at 05:08:57PM -0500,
Paras pradhan <pradhanparas@xxxxxxxxx> wrote:

I am using cluster suite for HA of xen virtual machines. Now I am
having another problem. When I start the my xen vm in one node, it
also starts on other nodes. Which daemon controls  this?

This is usually done bei clurgmgrd (which is part of the rgmanager
package). To me, this sounds like a configuration problem. Maybe,
you can post your cluster.conf?

Regards,
Volker

--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster

--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster

--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster

--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster

--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster

--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster

--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster

--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster

--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster

--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster

--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster

--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster

--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster

--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster

--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster

--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster

--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster