I have a three node cluster running latest 4.6 code with 14 gfs file systems running. On a three month old, heavily used gfs file system which has never had any problems, had no shared storage power outages or anything that I can think of that could have caused a problem in the fs, I got the following error and a withdraw: Jul 18 22:05:26 jrmedia-c kernel: GFS: fsid=alpha_cluster:wav-4.2: fatal: assertion "FALSE" failed Jul 18 22:05:26 jrmedia-c kernel: GFS: fsid=alpha_cluster:wav-4.2: function = xmote_bh Jul 18 22:05:26 jrmedia-c kernel: GFS: fsid=alpha_cluster:wav-4.2: file = /builddir/build/BUILD/gfs-kernel-2.6.9-75/smp/src/gfs/glock.c, line = 1093 Jul 18 22:05:26 jrmedia-c kernel: GFS: fsid=alpha_cluster:wav-4.2: time = 1216415126 Jul 18 22:05:26 jrmedia-c kernel: GFS: fsid=alpha_cluster:wav-4.2: about to withdraw from the cluster Jul 18 22:05:26 jrmedia-c kernel: GFS: fsid=alpha_cluster:wav-4.2: waiting for outstanding I/O Jul 18 22:05:26 jrmedia-c kernel: GFS: fsid=alpha_cluster:wav-4.2: telling LM to withdraw Jul 18 22:05:27 jrmedia-c kernel: GFS: fsid=alpha_cluster:wav-4.2: withdrawn Jul 18 22:05:27 jrmedia-c kernel: GFS: fsid=alpha_cluster:wav-4.2: ret = 0x00000002 The file system wouldn't unmount after this unfortunately and the only way to get the node up and running again was to do a fence. I checked bugzilla and can't find anything still open relating to this. Can anyone: 1. Suggest a good strategy for trying to get the fs unmounted so that a fence is not required and a normal reboot can be done? 2. Suggest what information I should have captured to better help debugging in the future, I think this would make a good FAQ and be helpful to all. Finally in the FAQ it says that after a gfs withdraws, the node should be rebooted before remounting, is this correct and is this related to replaying journals? What would happen if you didn't reboot? Cheers Ben -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster