Ok. I'm trying to implement GFS on two different clusters: 9 nodes, 17 nodes. I'm having nothing but troubles. The gfs volumes are freezing and throwing the cluster into a bad state. Currently, this is the state of my cluster: [root@plxp01md-new log]# cman_tool services type level name id state fence 0 default 00010004 none [1 2 3 4 5 6 7 8 9] dlm 1 clvmd 00010003 none [1 2 3 4 5 6 7 8 9] dlm 1 mdi_log 00020001 FAIL_START_WAIT [1 2 3 4 6 7 8 9] dlm 1 deploy 00040001 FAIL_START_WAIT [1 4 6 7 8 9] gfs 2 mdi_log 00010001 FAIL_START_WAIT [1 2 3 4 6 7 8 9] gfs 2 deploy 00030001 FAIL_START_WAIT I have no idea what happened. I've got users who are writing to a gfs volume and just came and reported to me that the volumes not responding. /var/log/messages has been outputting the following message, about 50 times a second, since Friday: Oct 9 13:54:35 plxp01deploy kernel: dlm: recover_master_copy -53 401ce Can someone tell me what FAIL_START_WAIT means and how to recover from it? Also, does anyone know what the log message above means? All my servers in the cluster are showing the same service states. I'm running RHEL5-64 bit. please help. I'm almost ready to give up on GFS. It seems way too unstable. James Fillman -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster