As you can see here http://pastebin.com/m7ac9376d I've configured both fence_drac and fence_manual. And fenced appears to be running: [root@test-db1 ~]# ps ax | grep fence 3412 ? Ss 0:00 /sbin/fenced 5109 pts/0 S+ 0:00 grep fence [root@test-db1 ~]# cman_tool services type level name id state fence 0 default 00010001 JOIN_START_WAIT [1 2] dlm 1 clvmd 00020002 JOIN_START_WAIT [1 2] dlm 1 rgmanager 00030002 JOIN_START_WAIT [1 2] dlm 1 pg_fs 00050002 JOIN_START_WAIT [1 2] gfs 2 pg_fs 00040002 JOIN_START_WAIT [1 2] And on test-db2: [root@test-db2 ~]# ps ax | grep fence 3428 ? Ss 0:00 /sbin/fenced 8848 pts/0 S+ 0:00 grep fence [root@test-db2 ~]# cman_tool services type level name id state fence 0 default 00010002 JOIN_START_WAIT [1 2] dlm 1 clvmd 00020002 JOIN_START_WAIT [1 2] dlm 1 rgmanager 00030002 JOIN_START_WAIT [1 2] dlm 1 pg_fs 00050002 JOIN_START_WAIT [1 2] gfs 2 pg_fs 00040002 JOIN_START_WAIT [1 2] / Jonas -----Original Message----- From: linux-cluster-bounces@xxxxxxxxxx [mailto:linux-cluster-bounces@xxxxxxxxxx] On Behalf Of Jeremy Carroll Sent: den 22 augusti 2007 15:47 To: linux clustering Subject: RE: Node fencing problem What type of fencing method are you using on your cluster? Also can you run "cman_tool services" on both nodes to make sure Fenced is running? -----Original Message----- From: linux-cluster-bounces@xxxxxxxxxx [mailto:linux-cluster-bounces@xxxxxxxxxx] On Behalf Of Borgström Jonas Sent: Wednesday, August 22, 2007 4:07 AM To: linux-cluster@xxxxxxxxxx Subject: Node fencing problem Hi, We're having some problems getting fencing to work as expected on our two-node cluster. Our cluster.conf file: http://pastebin.com/m7ac9376d kernel version: 2.6.18-8.1.8.el5 cman version: 2.0.64-1.0.1.el5 When I'm simulating a network failure on a node I expect it to be fenced by the other node but that doesn't happen for some reason: Steps to reproduce: 1. Start the cluster 2. Mount a GFS filesystem on both nodes (test-db1 and test-db2) 3. Simulate a net failure on test-db1 http://pastebin.com/m19fda088 Expected result: 1. Node test-db2 would detect that test-db1 failed 2. test-db1 get fenced by test-db2 3. test-db2 replays the GFS journal (filesystem writable again) 4. Fail over services from test-db1 to test-db2 Actual result: 1. Node-test-db2 detects that something happened to test-db1 2. test-db2 replays the GFS journal (filesystem writable again) 3. The service on test-db1 is still listed as started and not failed over to test-db2 even though test-db2 thinks test-db1 is "offline". Log files and debug output from test-db2: /var/log/messages after the failure: http://pastebin.com/m2fe4ce36 "group_tool dump fence" output: http://pastebin.com/m79d21ed9 clustat output: http://pastebin.com/m4d1007c2 And if I restore network connectivity on test-db1 the filsystem will become writeable on that node as well and probably results in filesystem corruption. I think the fencedevice part of cluster.conf is correct since nodes are sometimes fenced when the cluster is started and one node isn't joining fast enough. What am I doing wrong? Regards, Jonas -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster