Re: Multiple "rgmanager" instances after re-booting from a kernel panic.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Please find attached the cluster.conf file and the relevant logs from
both servers.

There are two scenarios executed:
1) From 11:48:00 till 11:55 (This is a normal/expected situation)
app01 is active. Kernel panic at 11:48:00
app02 resumes normally the service
app01 re-joins the cluster at 11:50:00
Kernel panic on app02 at 11:50:45
app01 starts normally the service
app02 re-joins the cluster correctly

2) From 11:55:30 till end (This is where the problem appear)
app01 is active. Kernel panic at 11:55:30
app02 resumes normally the service
app01 re-joins the cluster at 11:57:07
Manually migrate the service to app01 at 11:58:40
Service start normally on app01
kernel panic on app01 at 12:00:35
service resumes normally on app02
app01 re-joins the cluster at 12:02:09
After that, the clustat output on node app02 is:
Cluster Status for par_clu @ Wed Jan 29 12:30:46 2014
Member Status: Quorate

 Member Name                             ID   Status
 ------ ----                             ---- ------
 adr-par-app01-hb                            1 Online
 adr-par-app02-hb                            2 Online, Local, rgmanager

 Service Name                   Owner (Last)                   State
 ------- ----                   ----- ------                   -----
 service:sv-CPAR                adr-par-app02-hb               started

and on node app01 is:
Cluster Status for par_clu @ Wed Jan 29 12:30:43 2014
Member Status: Quorate

 Member Name                             ID   Status
 ------ ----                             ---- ------
 adr-par-app01-hb                            1 Online, Local
 adr-par-app02-hb                            2 Online

The output of "ps -ef | grep rgmanager" on node app01 is:
root      4034     1  0 12:02 ?        00:00:00 rgmanager
root      4036  4034  0 12:02 ?        00:00:00 rgmanager
root      4175  4036  0 12:02 ?        00:00:00 rgmanager

The problem is that rgmanager is not active anymore on node app01.
As a workaround, killing the last process (pid 4175) resumes the
rgmanager without restart.


Thanks for your help.

BR,
Demetres
<?xml version="1.0"?>
<cluster config_version="17" name="par_clu">
	<logging debug="on"/>
	<cman expected_votes="1" transport="udpu" two_node="1"/>
	<clusternodes>
		<clusternode name="adr-par-app01-hb" nodeid="1">
			<fence>
				<method name="FncSCSI">
					<device name="FenceSCSI"/>
				</method>
			</fence>
			<unfence>
				<device action="on" name="FenceSCSI"/>
			</unfence>
		</clusternode>
		<clusternode name="adr-par-app02-hb" nodeid="2">
			<fence>
				<method name="FncSCSI">
					<device name="FenceSCSI"/>
				</method>
			</fence>
			<unfence>
				<device action="on" name="FenceSCSI"/>
			</unfence>
		</clusternode>
	</clusternodes>
	<fencedevices>
		<fencedevice agent="fence_scsi" devices="/dev/emcpowera" logfile="/var/log/cluster/fence_scsi.log" name="FenceSCSI"/>
	</fencedevices>
	<rm>
		<failoverdomains>
			<failoverdomain name="CPAR" nofailback="1" ordered="1" restricted="0">
				<failoverdomainnode name="adr-par-app01-hb" priority="1"/>
				<failoverdomainnode name="adr-par-app02-hb" priority="2"/>
			</failoverdomain>
		</failoverdomains>
		<resources>
			<lvm lv_name="lvpar" name="lvpar" self_fence="1" vg_name="vgpar"/>
			<fs device="/dev/vgpar/lvpar" force_fsck="1" force_unmount="1" fstype="ext4" mountpoint="/shared" name="fspar" self_fence="1">
				<action depth="*" interval="10" name="status"/>
			</fs>
			<script file="/etc/init.d/arserver_ICOM" name="scCPAR"/>
			<ip address="10.120.158.7" disable_rdisc="1" monitor_link="1" sleeptime="2"/>
		</resources>
		<service domain="CPAR" name="sv-CPAR" recovery="relocate">
			<lvm ref="lvpar">
				<fs ref="fspar">
					<ip ref="10.120.158.7">
						<script ref="scCPAR"/>
					</ip>
				</fs>
			</lvm>
		</service>
	</rm>
	<fence_daemon/>
	<dlm protocol="tcp"/>
</cluster>

Attachment: messages_app01.txt.gz
Description: GNU Zip compressed data

Attachment: messages_app02.txt.gz
Description: GNU Zip compressed data

-- 
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster

[Index of Archives]     [Corosync Cluster Engine]     [GFS]     [Linux Virtualization]     [Centos Virtualization]     [Centos]     [Linux RAID]     [Fedora Users]     [Fedora SELinux]     [Big List of Linux Books]     [Yosemite Camping]

  Powered by Linux