On Mon, Aug 22, 2005 at 09:19:52PM +0200, Jan Bruvoll wrote: > Dear list, > > I am having problems with a node where I can't get it to rejoin the > fence domain. It has been rebooted before, and it has so far > automatically joined the fence domain so that that it could pick up the > rest of the depending services, but not this time. I upgraded the kernel > and cluster/GFS suite (this is a Gentoo system) to > gentoo-sources-2.6.12-r9 and cluster software v1.00.00. Are the nodes running slightly different versions of the cluster software? They must all be running the same version -- there was a change to the cman message formats shortly before 1.00.00 was released. > I guess the biggest problem is that I don't know what to actually do to > unfence the node that has been shut out. Since I have set the cluster up > to use manual fencing, I suppose the un-fence command to use is > fence_ack_manual, however using that only produces a warning about a > missing /tmp/fence_manual.fifo. Manually creating this fifo before > running the command only removes the fifo -and- produces the warning. > > This is what a cman_tool services emits: > > Service Name GID LID State Code > Fence Domain: "default" 0 2 join S-2,2,1 > [] Manual fencing is hard to use and get right, first recommendation is to not use it. You only need to run fence_ack_manual when instructed to do so by a message in /var/log/messages on some node. Dave -- Linux-cluster@xxxxxxxxxx http://www.redhat.com/mailman/listinfo/linux-cluster