Dear list, I am having problems with a node where I can't get it to rejoin the fence domain. It has been rebooted before, and it has so far automatically joined the fence domain so that that it could pick up the rest of the depending services, but not this time. I upgraded the kernel and cluster/GFS suite (this is a Gentoo system) to gentoo-sources-2.6.12-r9 and cluster software v1.00.00. I guess the biggest problem is that I don't know what to actually do to unfence the node that has been shut out. Since I have set the cluster up to use manual fencing, I suppose the un-fence command to use is fence_ack_manual, however using that only produces a warning about a missing /tmp/fence_manual.fifo. Manually creating this fifo before running the command only removes the fifo -and- produces the warning. This is what a cman_tool services emits: Service Name GID LID State Code Fence Domain: "default" 0 2 join S-2,2,1 [] I don't seem to be able to find any information anywhere on the "Codes" - any pointers there? The cluster has 6 members: one "file server" and five "clients". Excerpt from cluster.conf follows: <?xml version="1.0"?> <cluster name="nbs-sc-1" config_version="1"> <cman></cman> <dlm></dlm> <clusternodes> <clusternode name="fs-2" votes="2"> <fence> <method name="single"> <device name="human" ipaddr="10.42.0.200"/> </method> </fence> </clusternode> <clusternode name="app-1" votes="1"> <fence> <method name="single"> <device name="human" ipaddr="10.42.0.202"/> </method> </fence> </clusternode> [...] <clusternode name="app-5" votes="1"> <fence> <method name="single"> <device name="human" ipaddr="10.42.0.206"/> </method> </fence> </clusternode> </clusternodes> <fence_devices> <device name="human" agent="fence_manual"/> </fence_devices> </cluster> I also found this from dmesg - is this important?: SM: process_reply invalid id=0 nodeid=4 SM: process_reply invalid id=0 nodeid=1 SM: process_reply invalid id=0 nodeid=2 SM: process_reply invalid id=0 nodeid=6 SM: process_reply invalid id=0 nodeid=5 Any help or pointers to more information would be most appreciated. I have read through everything I could find on the i'net without becoming much wiser, and the status today is that I can't upgrade single servers in my cluster without taking down the whole group - which is hardly useful. Thanks in advance for any assistance! Best regards Jan Bruvoll -- Linux-cluster@xxxxxxxxxx http://www.redhat.com/mailman/listinfo/linux-cluster