Re: problems with clvmd and lvms on rhel6.1

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Not sure if it relates, but I can say that without fencing, things will break in strange ways. The reason is that if anything triggers a fault, the cluster blocks by design and stays blocked until a fence call succeeds (which is impossible without fencing configured in the first place).

Can you please setup fencing, test to make sure it works (using 'fence_node rhel2.local' from rhel1.local, then in reverse)? Once this is done, test again for your problem. If it still exists, please paste the updated cluster.conf then. Also please include syslog from both nodes around the time of your LVM tests.

digimer

On 08/10/2012 12:38 PM, Poós Krisztián wrote:
This is the cluster conf, Which is a clone of the problematic system on
a test environment (without the ORacle and SAP instances, only focusing
on this LVM issue, with an LVM resource)

[root@rhel2 ~]# cat /etc/cluster/cluster.conf
<?xml version="1.0"?>
<cluster config_version="7" name="teszt">
	<fence_daemon clean_start="0" post_fail_delay="0" post_join_delay="3"/>
	<clusternodes>
		<clusternode name="rhel1.local" nodeid="1" votes="1">
			<fence/>
		</clusternode>
		<clusternode name="rhel2.local" nodeid="2" votes="1">
			<fence/>
		</clusternode>
	</clusternodes>
	<cman expected_votes="3"/>
	<fencedevices/>
	<rm>
		<failoverdomains>
			<failoverdomain name="all" nofailback="1" ordered="1" restricted="0">
				<failoverdomainnode name="rhel1.local" priority="1"/>
				<failoverdomainnode name="rhel2.local" priority="2"/>
			</failoverdomain>
		</failoverdomains>
		<resources>
			<lvm lv_name="teszt-lv" name="teszt-lv" vg_name="teszt"/>
			<fs device="/dev/teszt/teszt-lv" fsid="43679" fstype="ext4"
mountpoint="/lvm" name="teszt-fs"/>
		</resources>
		<service autostart="1" domain="all" exclusive="0" name="teszt"
recovery="disable">
			<lvm ref="teszt-lv"/>
			<fs ref="teszt-fs"/>
		</service>
	</rm>
	<quorumd label="qdisk"/>
</cluster>

Here are the log parts:
Aug 10 17:21:21 rgmanager I am node #2
Aug 10 17:21:22 rgmanager Resource Group Manager Starting
Aug 10 17:21:22 rgmanager Loading Service Data
Aug 10 17:21:29 rgmanager Initializing Services
Aug 10 17:21:31 rgmanager /dev/dm-2 is not mounted
Aug 10 17:21:31 rgmanager Services Initialized
Aug 10 17:21:31 rgmanager State change: Local UP
Aug 10 17:21:31 rgmanager State change: rhel1.local UP
Aug 10 17:23:23 rgmanager Starting stopped service service:teszt
Aug 10 17:23:25 rgmanager Failed to activate logical volume, teszt/teszt-lv
Aug 10 17:23:25 rgmanager Attempting cleanup of teszt/teszt-lv
Aug 10 17:23:29 rgmanager Failed second attempt to activate teszt/teszt-lv
Aug 10 17:23:29 rgmanager start on lvm "teszt-lv" returned 1 (generic error)
Aug 10 17:23:29 rgmanager #68: Failed to start service:teszt; return
value: 1
Aug 10 17:23:29 rgmanager Stopping service service:teszt
Aug 10 17:23:30 rgmanager stop: Could not match /dev/teszt/teszt-lv with
a real device
Aug 10 17:23:30 rgmanager stop on fs "teszt-fs" returned 2 (invalid
argument(s))
Aug 10 17:23:31 rgmanager #12: RG service:teszt failed to stop;
intervention required
Aug 10 17:23:31 rgmanager Service service:teszt is failed
Aug 10 17:24:09 rgmanager #43: Service service:teszt has failed; can not
start.
Aug 10 17:24:09 rgmanager #13: Service service:teszt failed to stop cleanly
Aug 10 17:25:12 rgmanager Starting stopped service service:teszt
Aug 10 17:25:14 rgmanager Failed to activate logical volume, teszt/teszt-lv
Aug 10 17:25:15 rgmanager Attempting cleanup of teszt/teszt-lv
Aug 10 17:25:17 rgmanager Failed second attempt to activate teszt/teszt-lv
Aug 10 17:25:18 rgmanager start on lvm "teszt-lv" returned 1 (generic error)
Aug 10 17:25:18 rgmanager #68: Failed to start service:teszt; return
value: 1
Aug 10 17:25:18 rgmanager Stopping service service:teszt
Aug 10 17:25:19 rgmanager stop: Could not match /dev/teszt/teszt-lv with
a real device
Aug 10 17:25:19 rgmanager stop on fs "teszt-fs" returned 2 (invalid
argument(s))


After I manually started the lvm on node1 and tried to switch it on
node2 it's not able to start it.

Regards,
Krisztian


On 08/10/2012 05:15 PM, Digimer wrote:
On 08/10/2012 11:07 AM, Poós Krisztián wrote:
Dear all,

I hope that anyone run into this problem in the past, so maybe can help
me resolving this issue.

There is a 2 node rhel cluster with quorum also.
There are clustered lvms, where the -c- flag is on.
If I start clvmd all the clustered lvms became online.

After this if I start rgmanager, it deactivates all the volumes, and not
able to activate them anymore as there are no such devices anymore
during the startup of the service, so after this, the service fails.
All lvs remain without the active flag.

I can manually bring it up, but only if after clvmd is started, I set
the lvms manually offline by the lvchange -an <lv>
After this, when I start rgmanager, it can take it online without
problems. However I think, this action should be done by the rgmanager
itself. All the logs is full with the next:
rgmanager Making resilient: lvchange -an ....
rgmanager lv_exec_resilient failed
rgmanager lv_activate_resilient stop failed on ....

As well, sometimes the lvs/clvmd commands are also hanging. I have to
restart clvmd to make it work again. (sometimes killing it)

Anyone has any idea, what to check?

Thanks and regards,
Krisztian

Please paste your cluster.conf file with minimal edits.


--
Digimer
Papers and Projects: https://alteeve.com

--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster



[Index of Archives]     [Corosync Cluster Engine]     [GFS]     [Linux Virtualization]     [Centos Virtualization]     [Centos]     [Linux RAID]     [Fedora Users]     [Fedora SELinux]     [Big List of Linux Books]     [Yosemite Camping]

  Powered by Linux