Vijay Bellur <vbellur@xxxxxxxxxx> wrote: > I am also not certain why we end up with stale NFS mounts at the first > place. Any ideas as to why this might be happening? That happens when you try to umount while the NFS server is down, or in our case after the NFS server process was shut down. The kernel waits in unmount(2) for a server reply forever, and anything that tries to access the filesystem waits for unmount(2) to terminate and have a wchan set to tstile in ps -axl output. The first fix is therefore to unmount before calling the cleanup routine that terminates the glusterfsd that acts as the NFS server, like this: EXPECT_WITHIN $UMOUNT_TIMEOUT "Y" force_umount $N0 cleanup But then there is the case where the NFS server died before, which is a bug we want to fix, but it hangs in the tests and this is not desirable. umount -f $N0 does not help here since NetBSD umount(8) command does an unfortunate realpath(3) call that will lock it up on the unresponsive NFS mount before it actually calls unmount(2). NetBSD's umount(8) has a -R flag that cause it to skip the realpath(3) call. Hence umount -f -R $N0 is the way to go to work around that case. But this -R flag is not portable and should only be used in the NetBSD case. I will try to craft a change for that tomorrow. Note that there is a known NetBSD bug: if you do a system call that gets stuck because of an unresponsive NFS server, a umount -f -R $N0 will not unlock the situation because it waits for the first process to complete (tstile state). reboot -n is the only way to recover that. -- Emmanuel Dreyfus http://hcpnet.free.fr/pubz manu@xxxxxxxxxx _______________________________________________ Gluster-devel mailing list Gluster-devel@xxxxxxxxxxx http://www.gluster.org/mailman/listinfo/gluster-devel