Qdiskd issue over EMC CX3-20 Storage + EMC PowerPath multipathing software

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello all,

I'm having an issue with a RHCS4 Cluster. Here are some versioning information:
* Storage: EMC CX3-20, latest FLARE code applied;
* HBAs: 2 x QLogic 2462, latest/certified BIOS by EMC (v1.24);
* Servers: 2 Dell PowerEdge 2950, 2 quad-core processors, 8 GB of RAM, all available firmware updates applied; * OS: RHEL v4 Update 4 with kernel 2.6.9-42.0.10.ELsmp (latest kernel certified by EMC for RHEL4). RHEL4u5 is not certified by EMC yet, so we installed RHEL4u4 and upgraded the kernel only to the latest certified release;
* Processor Architecture: everything x86_64;
* RH Cluster Suite: latest non-kernel specific packages, the other packages (cman-kernel, dlm-kernel) are specific for the 2.6.9-42.0.10.ELsmp kernel; * Multipath/storage software: EMC PowerPath v5.0.0.157, Navisphere Agent v6.24.0.6.13.


We are experiencing a problem during our tests with the multipathing software. If we take out the fiber cable from one of the HBAs from one server, it removes itself from the Cluster because of losing access to the shared partition (this is an expected behaviour). But since we are pointing the Qdisk daemon to an EMC Power device (/dev/emcpowerXX), we expected that the multipathing should take care of the fibre channel outage.

So, I ask: is there any specific timers I should configure in cman or qdiskd so that I can give enough time for PowerPath to reconfigure the available paths? The Storage Administrator verified that all storage paths are active and functional.

By the way: I'm configuring qdiskd with no heuristics at all, since we didn't have any reliable "router" available to work as an IP tiebraker for the cluster. Since the Cluster FAQ (http://sources.redhat.com/cluster/faq.html#quorumdiskonly) states in question #23 (last paragraph) that in RHCS4U5 it is possible to have no heuristics at all, we are trying it in this installation for the first time.

Below I post the relevant part of my cluster.conf file:

<?xml version="1.0"?>
<cluster config_version="9" name="clu_xxxxxx">
<quorumd log_facility="local6" device="/dev/emcpowere1" interval="1" min_score="0" tko="10" votes="1"/>
	<fence_daemon post_fail_delay="10" post_join_delay="3"/>
	<clusternodes>
		<clusternode name="node1" votes="1">
			<fence>
				<method name="1">
					<device lanplus="" name="node1-ipmi"/>
				</method>
			</fence>
		</clusternode>
		<clusternode name="node2" votes="1">
			<fence>
				<method name="1">
					<device lanplus="" name="node2-ipmi"/>
				</method>
			</fence>
		</clusternode>
	</clusternodes>
	<cman/>
	<fencedevices>
<fencedevice agent="fence_ipmilan" auth="none" ipaddr="hercules01-ipmi" login="root" name="node1-ipmi" passwd="clusterprosper"/> <fencedevice agent="fence_ipmilan" auth="none" ipaddr="hercules02-ipmi" login="root" name="node2-ipmi" passwd="clusterprosper"/>
	</fencedevices>
...


Thank you very much for any ideas on this issue.

Regards,

Celso.

--
*Celso Kopp Webber*

celso@xxxxxxxxxxxxxxxx <mailto:celso@xxxxxxxxxxxxxxxx>

*Webbertek - Opensource Knowledge*
(41) 8813-1919 - celular
(41) 4063-8448, ramal 102 - fixo


--
Esta mensagem foi verificada pelo sistema de antivírus e
acredita-se estar livre de perigo.

--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster

[Index of Archives]     [Corosync Cluster Engine]     [GFS]     [Linux Virtualization]     [Centos Virtualization]     [Centos]     [Linux RAID]     [Fedora Users]     [Fedora SELinux]     [Big List of Linux Books]     [Yosemite Camping]

  Powered by Linux