>> when this handler gets called both nodes will try to fence each other.. Is that the intended effect? >Yes, in a network partition of a two-node cluster, both nodes will race >to fence. One wins, the other dies. ;) OK. >> b) If we try to do ssh <host> -c "drbdadm outdate all", gfs is still mounted on top of drbd and drbd is primary so here is no effect of the command and >> the split brain continues. I have seen this. >... but with resource-and-stonith, drbd freezes I/O until the >outdate-peer script returns a 4 or 7... If it doesn't return Could you please explain ... If it doesn't return? > Fail over an xdmcp session? I think xdm/gdm/etc. were not designed to > handle that sort of a failure case. It sounds like a cool idea, but I > would not even know where to begin to make that work. Well I could keep asking the question around and may be someday somebody will have an idea. Also could you please explain following from your obliterate script.. <quote> # now. Note that GFS *will* wait for this to occur, so if you're using GFS # on DRBD, you still don't get access. ;) </quote> What will GFS wait for? Fence status? Also since our APC masterswitch hasn't arrived yet, I modified the obliterate script to use ssh to do the dirty work. ( instead of using RHCS fencing i.e., also as I have a 3 node cluster.. Also I defined REMOTE manually on each of the two drbd nodes ) Here it is. Please comment if it will work? #!/bin/bash # ########################################################### # DRBD 0.8.2.1 -> linux-cluster super-simple fencing wrapper # # Kills the other node in a 2-node cluster. Only works with # 2-node clusters (FIXME?) # # ########################################################### # # Author: Lon Hohberger <lhh[a]redhat.com> # # Special thanks to fabioc on freenode # PATH="/bin:/sbin:/usr/bin:/usr/sbin" NODECOUNT=0 LOCAL_ID="2" REMOTE_ID="1" REMOTE="imstermserver1" echo "Local node ID: $LOCAL_ID" echo "Remote node ID: $REMOTE_ID" echo "Remote node: $REMOTE " # # This could be cleaner by calling cman_tool kill -n <node>, but then we have # to poll/wait for fence status, and I don't feel like writing that right # now. Note that GFS *will* wait for this to occur, so if you're using GFS # on DRBD, you still don't get access. ;) # #fence_node $REMOTE logger -f /var/log/messages "$0 : Fencing Node : $REMOTE" ssh $REMOTE drbdadm outdate all if [ $? -eq 0 ]; then logger -f /var/log/messages "$0 : drbdadm outdate all on $REMOTE succeded" # # Reference: # http://osdir.com/ml/linux.kernel.drbd.devel/2006-11/msg00005.html # # 4 = -> peer is outdated (this handler outdated it) [ resource fencing ] # ssh $REMOTE drbdadm resume-io all exit 4 fi logger -f /var/log/messages "$0 : drbdadm outdate all on $REMOTE FAILED!!" ssh $REMOTE poweroff -f if [ $? -eq 0 ]; then logger -f /var/log/messages "$0 : poweroff -f on $REMOTE succeded" # # Reference: # http://osdir..com/ml/linux.kernel.drbd.devel/2006-11/msg00005.html # # 7 = node got blown away. # ssh $REMOTE drbdadm resume-io all exit 7 fi logger -f /var/log/messages "$0 : poweroff -f on $REMOTE FAILED!!" # # Fencing failed?! # logger -f /var/log/messages "$0 : FENCING on $REMOTE FAILED!!" # Go along with split brain.. ssh $REMOTE drbdadm resume-io all drbdadm resume-io all exit 1 Regards Koustubha Kale Chat on a cool, new interface. No download required. Go to http://in.messenger.yahoo.com/webmessengerpromo.php -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster