I've been chasing down a problem where a customer has a localhost mount, and the sequence unmount -at nfs,nfs4 stop nfsserver sync hangs on the sync. The 'sync' is trying to write to the NFS filesystem that has just been unmounted. I have duplicated the problem on a current mainline kernel. There are two important facts that lead to the explanation of this. 1/ whenever a 'struct file' is open, an s_active reference is held on the superblock, via "open_context" calling nfs_sb_active(). This doesn't prevent "unmount" from succeeding (i.e. EBUSY isn't returned), but does prevent the actual unmount from happening (->kill_sb() isn't called). 2/ When a memory mapping of a file is torn down, the file is "released", causing the context to be discarded and the sb_active reference released, but unlike close(2), file_operations->flush() is not called. Consequently, if you: open an NFS file mmap some pages PROT_WRITE close the file modify the pages unmap the pages unmount the filesystem the filesystem will remain active, and the pages will remain dirty. If you then make the nfs server unavailable - e.g. stop it, or tear down the network connection - and then call 'sync', the sync will hang. This is surprising, at the least :-) I have two ideas how it might be fixed. One is to call nfs_file_flush() from within nfs_file_release(). This is probably simplest (and appears to work). The other is to add a ".close" to nfs_file_vm_ops. This could trigger a (partial) flush whenever a page is unmapped. As closing an NFS file always triggers a flush, it seems reasonable that unmapping a page would trigger a flush of that page. Thoughts? Thanks, NeilBrown
Attachment:
signature.asc
Description: PGP signature