Hello all,
I'm having an issue with a RHCS4 Cluster. Here are some versioning information:
* Storage: EMC CX3-20, latest FLARE code applied;
* HBAs: 2 x QLogic 2462, latest/certified BIOS by EMC (v1.24);
* Servers: 2 Dell PowerEdge 2950, 2 quad-core processors, 8 GB of RAM, all
available firmware updates applied;
* OS: RHEL v4 Update 4 with kernel 2.6.9-42.0.10.ELsmp (latest kernel
certified by EMC for RHEL4). RHEL4u5 is not certified by EMC yet, so we
installed RHEL4u4 and upgraded the kernel only to the latest certified release;
* Processor Architecture: everything x86_64;
* RH Cluster Suite: latest non-kernel specific packages, the other packages
(cman-kernel, dlm-kernel) are specific for the 2.6.9-42.0.10.ELsmp kernel;
* Multipath/storage software: EMC PowerPath v5.0.0.157, Navisphere Agent
v6.24.0.6.13.
We are experiencing a problem during our tests with the multipathing
software. If we take out the fiber cable from one of the HBAs from one
server, it removes itself from the Cluster because of losing access to the
shared partition (this is an expected behaviour). But since we are pointing
the Qdisk daemon to an EMC Power device (/dev/emcpowerXX), we expected that
the multipathing should take care of the fibre channel outage.
So, I ask: is there any specific timers I should configure in cman or qdiskd
so that I can give enough time for PowerPath to reconfigure the available
paths? The Storage Administrator verified that all storage paths are active
and functional.
By the way: I'm configuring qdiskd with no heuristics at all, since we
didn't have any reliable "router" available to work as an IP tiebraker for
the cluster. Since the Cluster FAQ
(http://sources.redhat.com/cluster/faq.html#quorumdiskonly) states in
question #23 (last paragraph) that in RHCS4U5 it is possible to have no
heuristics at all, we are trying it in this installation for the first time.
Below I post the relevant part of my cluster.conf file:
<?xml version="1.0"?>
<cluster config_version="9" name="clu_xxxxxx">
<quorumd log_facility="local6" device="/dev/emcpowere1" interval="1"
min_score="0" tko="10" votes="1"/>
<fence_daemon post_fail_delay="10" post_join_delay="3"/>
<clusternodes>
<clusternode name="node1" votes="1">
<fence>
<method name="1">
<device lanplus="" name="node1-ipmi"/>
</method>
</fence>
</clusternode>
<clusternode name="node2" votes="1">
<fence>
<method name="1">
<device lanplus="" name="node2-ipmi"/>
</method>
</fence>
</clusternode>
</clusternodes>
<cman/>
<fencedevices>
<fencedevice agent="fence_ipmilan" auth="none" ipaddr="hercules01-ipmi"
login="root" name="node1-ipmi" passwd="clusterprosper"/>
<fencedevice agent="fence_ipmilan" auth="none" ipaddr="hercules02-ipmi"
login="root" name="node2-ipmi" passwd="clusterprosper"/>
</fencedevices>
...
Thank you very much for any ideas on this issue.
Regards,
Celso.
--
*Celso Kopp Webber*
celso@xxxxxxxxxxxxxxxx <mailto:celso@xxxxxxxxxxxxxxxx>
*Webbertek - Opensource Knowledge*
(41) 8813-1919 - celular
(41) 4063-8448, ramal 102 - fixo
--
Esta mensagem foi verificada pelo sistema de antivírus e
acredita-se estar livre de perigo.
--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster