> [...cut...] > certain level (heavy I/O), it collapses. About 20% of the nodes do crash > (not reacting any more, but no sign of kernel panic), the others can't > access the gfs resource. > [...cut...] > > [root@compute-0-6 ~]# cat /proc/cluster/services > Service Name GID LID State Code > Fence Domain: "default" 3 2 recover 4 - > [1 2 6 10 9 8 3 7 4 11] > DLM Lock Space: "clvmd" 7 3 recover 0 - > [1 2 6 10 9 8 3 7 4 11] > DLM Lock Space: "Magma" 12 5 recover 0 - > [1 2 6 10 9 8 3 7 4 11] > DLM Lock Space: "homeneu" 17 6 recover 0 - > [10 9 8 7 2 3 6 4 1 11] > GFS Mount Group: "homeneu" 18 7 recover 0 - > [10 9 8 7 2 3 6 4 1 11] > User: "usrm::manager" 11 4 recover 0 - > [1 2 6 10 9 8 3 7 4 11] Hello, 1. Do You have Fibre-Channel SAN storage or use GNDB/iSCSI? 2. The other nodes can't access GFS fs because cluster is in recover state. Do You have fencing properly configured? Best Regards Maciej Bogucki -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster