Hi Christine, thanks for the feedback (and while im thanking you, also for the programming locking applications book :) On Wed, Aug 13, 2008 at 12:09 PM, Christine Caulfield <ccaulfie@xxxxxxxxxx> wrote: >> I think I found a problem with the way it starts up... See just below >> the startup output for more info... >>> Mounting GFS filesystems: GFS 0.1.1-7.el5 installed >>> Trying to join cluster "lock_dlm","jemdevcluster:cache1" >>> dlm: Using TCP for communications >>> dlm: connecting to 2 >>> dlm: got connection to 2 >>> dlm: connecting to 2 >>> dlm: got connection from 4 >> >> Could this be the problem? > > Yes, that's bad! You should only get one "connecting to" message per node. > If you're getting two it looks like the connection is being closed by the > remote node for some reason. Are there any messages on node 2 that might > give a clue as to what's happening ? That was it. qdiskd service was not running on all nodes, and I had restarted it a few times. In addition to that, I had run a few config updates with ccs_tool and also cman_tool expected 4 to lower my quorum, as cluster was locking up due to loosing it. Obviously, it was loosing quorum because the qdiskd service was not running and the cluster was 2 votes short. "Cluster is not quorate, refusing connection" in node2 logs. Eventually had to restart the entire cluster to get things running, seems that gfs does not recover that well once it looses quorum. * 1st rule of troubleshooting - check the logs. -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster