Maciej Bogucki napisał(a): > Hello, > > I have five node cluster. Node05 failed(kernel panic), and fencing > failed. When I rebooted failed node05, it can't connect to cluster and > filesystem is locked, because it is in the recover state. I need to > reboot all nodes to recover cluster. > > On node05 I get "fenced: startup failed" > > Here is the output form another node in cluster: > > ---cut--- > [root@node03 ~]# cat /proc/cluster/services > Service Name GID LID State Code > Fence Domain: "default" 1 2 run > U-1,10,1 > [2 3 5 4] > > DLM Lock Space: "clvmd" 2 3 run > U-1,10,1 > [2 3 5 4] > > DLM Lock Space: "repository" 3 4 recover 2 - > [2 3 5 4] > > GFS Mount Group: "repository" 4 5 recover 0 - > [2 3 5 4] > > [root@node03 ~]# > ---cut--- > > What does mean "U-1,10,1"? > > Here is some information form cluster.conf > > ---cut--- > <fence_daemon clean_start="1" post_fail_delay="0" post_join_delay="3"/> > <cman expected_votes="3" deadnode_timeout="120" hello_timer="10"/> > ---cut--- > > I don't have the latest cman, fence, dlm, and kernel, so maybe it is a > problem? > > cman-1.0.11-0 > fence-1.32.25-1 > dlm-1.0.1-1 > kernel-smp-2.6.9-42.0.3.EL > I have found it in the logs also Aug 16 14:13:44 node03 kernel: dlm: repository: restbl_rsb_update_recv rsb not found 67098 Aug 16 14:14:07 node03 kernel: dlm: repository: restbl_rsb_update_recv rsb not found 72602 Aug 16 14:14:23 node03 kernel: dlm: repository: restbl_rsb_update_recv rsb not found 64752 Aug 16 14:14:23 node03 kernel: dlm: repository: restbl_rsb_update_recv rsb not found 67108 Aug 16 14:14:23 node03 kernel: dlm: repository: restbl_rsb_update_recv rsb not found 69654 Aug 16 14:14:23 node03 kernel: dlm: repository: restbl_rsb_update_recv rsb not found 69781 Aug 16 14:14:23 node03 kernel: dlm: repository: restbl_rsb_update_recv rsb not found 87705 What does it mean? Best Regards Maciej Bogucki -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster