Javi Polo wrote: > Hi there (again :P) > > I'm still fighting with all this, sorry to bother so much (hope some day > when I understand it all better I'll write some article on how to set this up) > > Well, I have already up the cluster and mounted the gfs filesystem in 3 > machines, and if one of those goes down, it's correctly fenced. The FC > port is also disconnected, so I suppose at this point is everything ok. > > The problem is on the recovery. I understand that when a node rejoins > is automaticaly unfenced, and then it can rejoin the fence and > mount again the filesystem. > > I've blocked all input and output traffic on the node I want to test > with iptables. > > The node gets fenced ok: > Aug 8 16:00:48 gfstest2 fenced[2594]: fencing node "gfstest1" > Aug 8 16:00:56 gfstest2 fenced[2594]: fence "gfstest1" success What sort of fencing are you using? If it's a power-switch fence then the node should be hard rebooted. If it's SAN fencing then you'll have to get the node out of the cluster - the remaining two nodes /should/ tell it it leave the cluster. A node can't just "rejoin" a cluster after being SAN fenced. it must be removed from the cluster and rejoin from scratch. There's far too much state involved for it to merge seamlessly back into a cluster. > Now I can access the GFS filesystem safely from my other 2 nodes, as the > FC port for gfstest1 is disabled now, but if I enable traffic for the > node, it does not rejoin the cluster. Shouldnt this be automatically? > > Anyway, I cannot rejoin/leave/whatever the cluster from gfstest1: > gfstest1:~# cman_tool services > Service Name GID LID State Code > Fence Domain: "default" 1 2 run - > [1 2 3] > > DLM Lock Space: "primer_fs" 2 3 run - > [1 2 3] > > GFS Mount Group: "primer_fs" 3 4 run - > [1 2 3] > > gfstest1:~# cman_tool join > cman_tool: Node is already active > gfstest1:~# cman_tool leave > cman_tool: Can't leave cluster while there are 5 active subsystems cman_tool leave force will force it to leave, but you might find it still needs a reboot to clear the filesystems. > and also, I cannot umount /dev/sdc1 as I have no access to the SAN > (and however DLM should block him not to do so). So I get a totally > screwed up system, that I can just fix by hard-rebooting (if I do a > clean reboot, the system "hangs" while "umounting filesystems"). > > Also, when the system boots up, the SAN is still unaccessible, as the > fencing script does not run to re-enable the port ... > > I'm loooooost diving into google querys ... and certainly it's hard to > find accurate info about all this :/ > > could someone spot some light? > (probably I dont understand well how the fencing system works, but also > havent find anywhere where its explained :/) > > thx in advance :) -- patrick -- Linux-cluster@xxxxxxxxxx http://www.redhat.com/mailman/listinfo/linux-cluster