On Mon, 2006-10-09 at 18:41 +0200, Jos Vos wrote: > Can someone give a config example of this? Should the score be > asymmetric in that case? A scenario of what happens w.r.t. score, > votes, and how the respective cman subsystems react to this would > be nice to have. A simple example is pinging an upstream router, like an "ip tiebreaker" in RHCS3. If you can't ping it, you lose. More complex examples might involve checking network connectivity, then following up with a check for particular service components. It's up to the administrator to make sure the logic makes sense for their particular installation. As an example, here's a totally untested, unverified, and probably broken script that is meant to be the only heuristic (give it 10 points or something in the configuration). It does sequential checks of multiple things. The idea is that this script should give weight to the owner of a particular service in the event of a network partition, but ensures that you have "upstream" network connectivity first (after all, what's the point of running a network service if no one can reach the service?). It assumes a lot of things, like node name == `hostname`, service having an IP address, etc... This is just one idea. I need to write up more/better examples in the near future. If anyone else has ideas to add, feel free. When the score drops below 1/2, qdiskd advertises to CMAN that the "quorum device" is gone. CMAN loses local votes and the node becomes inquorate. There's a bug in qdisk that prevents qdisk from rebooting the node the way it is documented to upon this loss of score. (This is already fixed in CVS.) -- Lon #!/bin/bash SVC=test SVCIP=10.1.1.10 # Service IP TBIP=10.1.1.1 # IP tiebreaker # # Check tiebreaker(s). # ping -c1 -t1 $TBIP if [ $? -ne 0 ]; then exit 1 fi # # If rgmanager is not running, we are quorate. The administrator either # didn't start it, or stopped it cleanly (the watchdog should catch # rgmanager crashes). # if ! service rgmanager status; then exit 0 fi # # Cut up the XML attrs of the service output from clustat into chunks # XXX is this even possible in a network partition? It should be # with the -f option to clustat, but untested. # while read info; do declare field val field=${info/=*/} val=${info/*=/} # XXX breaks up last_transition_str. Fix later. if [ "$field" != "$val" ] && [ "$field" != "last_transition_str" ]; then eval x_$field=$val fi done < <(clustat -fxs $SVC | grep "name=\"$SVC\"" | \ sed -e s/[\\\ ]\\+/\\\n/g) if [ "$x_service_owner" = "" ] || [ "$x_service_owner" = "none" ]; then # # Service disabled/failed/stopped/otherwise not running or # state unavailable (maybe rgmanager's just booting?) # exit 0 fi if [ "$x_service_owner" = "`hostname`" ]; then # # I own the service. # exit 0 fi # # I don't own the service. # If we can see the IP and we're not the owner, we're quorate. # ping -c1 -t1 $SVCIP exit $? -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster