On Mon, Sep 24, 2007 at 10:52:49AM -0300, Celso K. Webber wrote: > Hello all, > > I'm having an issue with a RHCS4 Cluster. Here are some versioning > information: > * Storage: EMC CX3-20, latest FLARE code applied; > * HBAs: 2 x QLogic 2462, latest/certified BIOS by EMC (v1.24); > * Servers: 2 Dell PowerEdge 2950, 2 quad-core processors, 8 GB of RAM, all > available firmware updates applied; > * OS: RHEL v4 Update 4 with kernel 2.6.9-42.0.10.ELsmp (latest kernel > certified by EMC for RHEL4). RHEL4u5 is not certified by EMC yet, so we > installed RHEL4u4 and upgraded the kernel only to the latest certified > release; > * Processor Architecture: everything x86_64; > * RH Cluster Suite: latest non-kernel specific packages, the other packages > (cman-kernel, dlm-kernel) are specific for the 2.6.9-42.0.10.ELsmp kernel; > * Multipath/storage software: EMC PowerPath v5.0.0.157, Navisphere Agent > v6.24.0.6.13. > > > We are experiencing a problem during our tests with the multipathing > software. If we take out the fiber cable from one of the HBAs from one > server, it removes itself from the Cluster because of losing access to the > shared partition (this is an expected behaviour). But since we are pointing > the Qdisk daemon to an EMC Power device (/dev/emcpowerXX), we expected that > the multipathing should take care of the fibre channel outage. Yes, it should. > > So, I ask: is there any specific timers I should configure in cman or > qdiskd so that I can give enough time for PowerPath to reconfigure the > available paths? The Storage Administrator verified that all storage paths > are active and functional. Yes, you can adjust interval + TKO count. See the qdisk(5) man page. Note that qdisk timings should be < (0.5 * cluster_timeout), so you will need to adjust your cluster timeout accordingly: <cman deadnode_timeout="..." .../> > By the way: I'm configuring qdiskd with no heuristics at all, since we > didn't have any reliable "router" available to work as an IP tiebraker for > the cluster. Since the Cluster FAQ > (http://sources.redhat.com/cluster/faq.html#quorumdiskonly) states in > question #23 (last paragraph) that in RHCS4U5 it is possible to have no > heuristics at all, we are trying it in this installation for the first time. Correct, but it's nice to have them :) > > <?xml version="1.0"?> > <cluster config_version="9" name="clu_xxxxxx"> > <quorumd log_facility="local6" device="/dev/emcpowere1" interval="1" > min_score="0" tko="10" votes="1"/> interval*tko = qdisk timeout (in seconds) > <cman/> <cman deadnode_timeout="X"/> ... where X = 2 * interval * tko + 1 The qdisk timeout should be set to something which exceeds the Power Path failure detection timeout; I don't know what that is... -- Lon Hohberger - Software Engineer - Red Hat, Inc. -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster