On Wed, May 02, 2007 at 02:44:06PM +0100, Frederik Ferner wrote: > Hi, > > finally I had a chance to experiment with the test rpms for cman[1] that > should solve the problem with multiple master I had... > > For these tests I was using the following rpms on RHEL4U4: > > kernel-smp-2.6.9-42.0.3.EL > cman-kernel-smp-2.6.9-45.8.1TEST > cman-1.0.11-0.4.1qdisk > rgmanager-1.9.54-1 > > To test this I have two server connected to one switch with nothing else > connected and one uplink. As heuristics for qdiskd I'm pinging a few IP > addresses outside of this switch. When I unplug the uplink with the old > cman installed, qdiskd on both servers immediately notice this and lower > the score accordingly. > With the new version of qdiskd it seems the heuristics are not tested > anymore after it reaches a sufficient score once. When the outside > network is lost qdiskd on both server still claim the same score in the > status file and both servers report the votes for the qdisk to cman. Hmm, could you add 'tko="1"' to your cluster.conf for the heuristics? I wonder if it's an initialization problem. > If qdiskd is started while the outside network is unreachable the scores > start without the scores for the failing heuristics. Once network is > restored the score jumps to at least the minimum required for operation > and once again stays there. > > Is this a bug that will be fixed in the upcoming RHEL4U5 release or > could there be something else wrong with my setup? This seems to work for me: [10538] debug: Heuristic: 'ping 192.168.79.254 -c1 -t3' missed (1/3) [10538] debug: Heuristic: 'ping 192.168.79.254 -c1 -t3' missed (2/3) [10538] info: Heuristic: 'ping 192.168.79.254 -c1 -t3' DOWN (3/3) [10537] notice: Score insufficient for master operation (0/11; required=6); downgrading Message from syslogd@green at Mon May 7 10:36:43 2007 ... green clurgmgrd[7305]: <emerg> #1: Quorum Dissolved (machine rebooted) > Here's my quorumd section from cluster.conf > > ----- > <quorumd interval="1" tko="5" votes="3" log_level="9" > log_facility="local4" status_file="/tmp/qdisk_status" > device="/dev/emcpowerq1"> > <heuristic program="ping 172.23.4.254 -c1 -t1" score="1" > interval="2"/> > <heuristic program="ping 130.246.8.13 -c1 -t3" score="1" > interval="2"/> > <heuristic program="ping 130.246.72.21 -c1 -t3" score="1" > interval="2"/> > <heuristic program="ping 172.23.5.120 -c1 -t1" score="1" > interval="2"/> > <heuristic program="ping 172.23.6.229 -c1 -t1" score="1" > interval="2"/> > <heuristic program="ping 172.23.7.34 -c1 -t1" score="1" > interval="2"/> > <heuristic program="ping 172.23.7.35 -c1 -t1" score="1" > interval="2"/> > <heuristic program="ping 172.23.6.233 -c1 -t1" score="1" > interval="2"/> > </quorumd> > ----- > If you need any more information, I happy to provide this. Hmm, try adding tko="3" to each of your ping heuristics, like this: <heuristic program="ping 172.23.6.233 -c1 -t1" score="1" interval="2" tko="3"/> -- Lon -- Lon Hohberger - Software Engineer - Red Hat, Inc. -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster