Re: rgmanager not running

<Sunil_Gupta2@xxxxxxxx> · Tue, 8 Mar 2011 23:02:51 -0800

The rgmanager service is not necessary if the cluster has no resources to manage....further more info on cluster status is needed like

#clustat

If it says all the nodes are online then more debug logs will be needed to find out the problem.

--Sunil
-----Original Message-----
From: linux-cluster-bounces@xxxxxxxxxx [mailto:linux-cluster-bounces@xxxxxxxxxx] On Behalf Of Balaji Sundar
Sent: Monday, March 07, 2011 2:04 PM
To: linux-cluster@xxxxxxxxxx
Subject:  rgmanager not running

Dear All,

I have using RHEL6 Linux and Kernel Version is 2.6.32-71.el6.i686

I have configured Cluster Suite with 2 servers Server 1 : 192.168.13.131 IP Address and hostname is primary Server 2 : 192.168.13.132 IP Address and hostname is secondary Floating : 192.168.13.133 IP Address (Assumed by currently active server)

I have verified that service cman is running and cluster.conf is valid using ccs_config_validate command

Finally i found that rgmanager is not running and services are not started [root@primary cluster]# service rgmanager status rgmanager dead but pid file exists [root@primary cluster]# [root@primary cluster]# cman_tool services [root@primary cluster]# [root@primary cluster]# cman_tool status
Version: 6.2.0
Config Version: 1
Cluster Name: EMSCluster
Cluster Id: 808
Cluster Member: Yes
Cluster Generation: 96
Membership state: Cluster-Member
Nodes: 1
Expected votes: 1
Total votes: 1
Node votes: 1
Quorum: 1
Active subsystems: 7
Flags: 2node
Ports Bound: 0
Node name: primary
Node ID: 1
Multicast addresses: 239.192.3.43
Node addresses: 192.168.13.131
[root@primary cluster]#

Found some error messages in "/var/log/messages" file
Mar  7 14:39:42 primary corosync[7155]:   [CMAN  ] quorum regained,
resuming activity
Mar  7 14:39:42 primary corosync[7155]:   [QUORUM] This node is within the
primary component and will provide service.
Mar  7 14:39:42 primary corosync[7155]:   [QUORUM] Members[1]: 1
Mar  7 14:39:42 primary corosync[7155]:   [QUORUM] Members[1]: 1
Mar  7 14:39:42 primary corosync[7155]:   [CPG   ] downlist received
left_list: 0
Mar  7 14:39:42 primary corosync[7155]:   [CPG   ] chosen downlist from
node r(0) ip(192.168.13.131)
Mar  7 14:39:42 primary corosync[7155]:   [MAIN  ] Completed service
synchronization, ready to provide service.
Mar  7 14:39:44 primary fenced[7210]: fenced 3.0.12 started Mar  7 14:39:45 primary dlm_controld[7224]: dlm_controld 3.0.12 started Mar  7 14:39:45 primary gfs_controld[7254]: gfs_controld 3.0.12 started Mar  7 14:39:45 primary kernel: dlm: Using TCP for communications Mar  7 14:39:45 primary dlm_controld[7224]: dlm_join_lockspace no fence domain Mar  7 14:39:45 primary dlm_controld[7224]: process_uevent online@ error
-1 errno 2
Mar  7 14:39:45 primary kernel: dlm: rgmanager: group join failed -1 -1

Found some error messages in "/var/log/cluster/dlm_controld.log" file Mar 07 14:39:45 dlm_controld dlm_controld 3.0.12 started Mar 07 14:39:45 dlm_controld dlm_join_lockspace no fence domain Mar 07 14:39:45 dlm_controld process_uevent online@ error -1 errno 2

I don't know what is the problem and Can some one throw light on this peculiar problem

Thanks in Advance

--Regards
S.Balaji

--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster

--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster