please include /var/log/messages from one system as well as group_tool dump on one of the crashed nodes. What brand/model of switch are you using? Regards -steve Could you use On Tue, 2007-10-09 at 16:52 -0700, James Fillman wrote: > Ok. I'm trying to implement GFS on two different clusters: 9 nodes, 17 > nodes. > > I'm having nothing but troubles. The gfs volumes are freezing and > throwing the cluster into a bad state. Currently, this is the state of > my cluster: > > [root@plxp01md-new log]# cman_tool services > type level name id state > fence 0 default 00010004 none > [1 2 3 4 5 6 7 8 9] > dlm 1 clvmd 00010003 none > [1 2 3 4 5 6 7 8 9] > dlm 1 mdi_log 00020001 FAIL_START_WAIT > [1 2 3 4 6 7 8 9] > dlm 1 deploy 00040001 FAIL_START_WAIT > [1 4 6 7 8 9] > gfs 2 mdi_log 00010001 FAIL_START_WAIT > [1 2 3 4 6 7 8 9] > gfs 2 deploy 00030001 FAIL_START_WAIT > > I have no idea what happened. I've got users who are writing to a gfs > volume and just came and reported to me that the volumes not responding. > /var/log/messages has been outputting the following message, about 50 > times a second, since Friday: > > Oct 9 13:54:35 plxp01deploy kernel: dlm: recover_master_copy -53 401ce > > Can someone tell me what FAIL_START_WAIT means and how to recover from > it? Also, does anyone know what the log message above means? > > All my servers in the cluster are showing the same service states. > > I'm running RHEL5-64 bit. > > please help. I'm almost ready to give up on GFS. It seems way too > unstable. > > James Fillman > > > > -- > Linux-cluster mailing list > Linux-cluster@xxxxxxxxxx > https://www.redhat.com/mailman/listinfo/linux-cluster -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster