I'm testing a situation where the gateway (192.168.1.1) fails for some time (a minute or so) for both nodes of a two-node cluster with qdisk, and qdiskd heuristic is set up with this gateway as the ping device. fencing is provided by iLO cluster version is what provided with rhel 5.3 and with the cman unofficial patch released before 2.0.98-1.el5_3.1 patch (see https://bugzilla.redhat.com/show_bug.cgi?id=485026 ) I'm going to update openais rgmanager and cman as for the 9th of April advise, but at the moment I would like to stay with this, to simulate a production cluster I have. relevant parts of cluster.conf are: <cman expected_votes="3" two_node="0"/> <quorumd device="/dev/mapper/mpath3" interval="3" label="cluquorum" log_facility="local4" log_level="7" tko="5" votes="1"> <heuristic interval="2" program="ping -c1 -w1 192.168.1.1" score="1" tko="3"/> </quorumd> So, when the gateway becomes unreachable by both the nodes I have this situation: 1) node2 (srv02) has the services (vip, filesystems and an Oracle RDBMS) running 2) In qdiskd.log of it I get: Apr 23 18:06:24 srv02 qdiskd[10702]: <debug> Heuristic: 'ping -c1 -w1 192.168.1.1' missed (1/3) Apr 23 18:06:27 srv02 qdiskd[10702]: <debug> Heuristic: 'ping -c1 -w1 192.168.1.1' missed (2/3) Apr 23 18:06:30 srv02 qdiskd[10702]: <info> Heuristic: 'ping -c1 -w1 192.168.1.1' DOWN (3/3) Apr 23 18:06:31 srv02 qdiskd[10702]: <notice> Score insufficient for master operation (0/1; required=1); downgrading In messages: Apr 23 18:06:30 srv02 qdiskd[10702]: <info> Heuristic: 'ping -c1 -w1 192.168.1.1' DOWN (3/3) Apr 23 18:06:31 srv02 qdiskd[10702]: <notice> Score insufficient for master operation (0/1; required=1); downgrading Apr 23 18:06:31 srv02 kernel: md: stopping all md devices. and then I get the following message about the reboot (see below) Apr 23 18:08:50 srv02 syslogd 1.4.1: restart (remote reception). Apr 23 18:08:50 srv02 kernel: klogd 1.4.1, log source = /proc/kmsg started. 3) In qdiskd.log of node1 (srv01) I get: Apr 23 18:06:25 srv01 qdiskd[5946]: <debug> Heuristic: 'ping -c1 -w1 192.168.1.1' missed (1/3) Apr 23 18:06:28 srv01 qdiskd[5946]: <debug> Heuristic: 'ping -c1 -w1 192.168.1.1' missed (2/3) Apr 23 18:06:31 srv01 qdiskd[5946]: <info> Heuristic: 'ping -c1 -w1 192.168.1.1' DOWN (3/3) Apr 23 18:06:32 srv01 qdiskd[5946]: <notice> Score insufficient for master operation (0/1; required=1); downgrading In mesages: Apr 23 18:06:31 srv01 qdiskd[5946]: <info> Heuristic: 'ping -c1 -w1 192.168.1.1' DOWN (3/3) Apr 23 18:06:32 srv01 qdiskd[5946]: <notice> Score insufficient for master operation (0/1; required=1); downgrading Apr 23 18:06:32 srv01 kernel: md: stopping all md devices. and then I get the following message about the reboot (see below) Apr 23 18:08:50 srv01 syslogd 1.4.1: restart (remote reception). Apr 23 18:08:50 srv01 kernel: klogd 1.4.1, log source = /proc/kmsg started. 4) from iLO logs of the two nodes, it seems they fenced each other srv02 iLO log (iLO is 1 hour behind) Informational iLO 2 04/23/2009 17:07 04/23/2009 17:07 1 Server power restored. Caution iLO 2 04/23/2009 17:07 04/23/2009 17:07 1 Server reset. srv01 iLO log (iLO is 1 hour behind) Informational iLO 2 04/23/2009 17:06 04/23/2009 17:06 1 Server power restored. Caution iLO 2 04/23/2009 17:06 04/23/2009 17:06 1 Server reset. the final setup is that srv01 starts a little before srv02, the gateway is up at this time, they form the quorum because are both up, and then also qdisk comes available and they reach 3 votes. the question is: is it reciprocal fencing the expected behaviour in this case? Or should they remain both in a sort of waiting state till the network becomes available again and writing a sort of Cluster is not quorate. Refusing connection. message? What exactly means "downgrading" inside the phrase "Score insufficient for master operation (0/1; required=1); downgrading"? That it will stoip all the services or that it will be fenced? Based on my cluster.conf, is it correct to say that in my case if network gateway remains unreachable for something between 7 and 9 seconds I will get this behaviour? Any suggestions? Thanks Gianluca -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster