My latest test run only made it 22 hours. It was starting a test that mounts gfs on all 3 nodes. The first 2 nodes mounted the gfs file system without any problem, but the 3rd node's mount hung and it got kicked out of the cluster: cl032 (3rd node): CMAN: removing node cl030a from the cluster : Missed too many heartbeats CMAN: removing node cl031a from the cluster : No response to messages CMAN: quorum lost, blocking activity [-- MARK -- Wed Mar 30 09:15:00 2005] GFS: Trying to join cluster "lock_dlm", "gfs_cluster:stripefs" cl030 (1st node): CMAN: removing node cl032a from the cluster : Missed too many heartbeats GFS: Trying to join cluster "lock_dlm", "gfs_cluster:stripefs" SM: process_reply invalid id=6764 nodeid=4294967295 GFS: fsid=gfs_cluster:stripefs.0: Joined cluster. Now mounting FS... GFS: fsid=gfs_cluster:stripefs.0: jid=0: Trying to acquire journal lock... GFS: fsid=gfs_cluster:stripefs.0: jid=0: Looking at journal... GFS: fsid=gfs_cluster:stripefs.0: jid=0: Done GFS: fsid=gfs_cluster:stripefs.0: jid=1: Trying to acquire journal lock... GFS: fsid=gfs_cluster:stripefs.0: jid=1: Looking at journal... GFS: fsid=gfs_cluster:stripefs.0: jid=1: Done GFS: fsid=gfs_cluster:stripefs.0: jid=2: Trying to acquire journal lock... GFS: fsid=gfs_cluster:stripefs.0: jid=2: Looking at journal... GFS: fsid=gfs_cluster:stripefs.0: jid=2: Done GFS: fsid=gfs_cluster:stripefs.0: jid=3: Trying to acquire journal lock... GFS: fsid=gfs_cluster:stripefs.0: jid=3: Looking at journal... GFS: fsid=gfs_cluster:stripefs.0: jid=3: Done SM: process_reply invalid id=6764 nodeid=4294967295 SM: process_reply invalid id=6765 nodeid=4294967295 SM: process_reply invalid id=6765 nodeid=4294967295 SM: process_reply invalid id=6765 nodeid=4294967295 SM: process_reply invalid id=6765 nodeid=4294967295 GFS: Trying to join cluster "lock_dlm", "gfs_cluster:stripefs" SM: process_reply invalid id=5553 nodeid=4294967295 SM: process_reply invalid id=5553 nodeid=4294967295 ... cl031 (2nd node): CMAN: node cl032a has been removed from the cluster : Missed too many heartbeatsSM: process_reply invalid id=6764 nodeid=4294967295 SM: process_reply invalid id=6764 nodeid=4294967295 SM: process_reply invalid id=6764 nodeid=4294967295 GFS: Trying to join cluster "lock_dlm", "gfs_cluster:stripefs" SM: process_reply invalid id=6765 nodeid=4294967295 GFS: fsid=gfs_cluster:stripefs.1: Joined cluster. Now mounting FS... GFS: fsid=gfs_cluster:stripefs.1: jid=1: Trying to acquire journal lock... GFS: fsid=gfs_cluster:stripefs.1: jid=1: Looking at journal... GFS: fsid=gfs_cluster:stripefs.1: jid=1: Done SM: process_reply invalid id=6765 nodeid=4294967295 SM: process_reply invalid id=6765 nodeid=4294967295 SM: process_reply invalid id=5553 nodeid=4294967295 A whole lot more info is available here: http://developer.osdl.org/daniel/GFS/test.29mar2005/ Any ideas on what happened? Daniel