On Fri, 2017-08-18 at 17:12 +1000, NeilBrown wrote: > Any dirty NFS page holds an s_active reference on the superblock, > because page_private() references an nfs_page, which references an > open context, which references the superblock. > > So if there are any dirty pages when the filesystem is unmounted, the > unmount will act like a "lazy" unmount and not call ->kill_sb(). > Background write-back can then write out the pages *after* the > filesystem > unmount has apparently completed. > > This contrasts with other filesystems which do not hold extra > s_active > references, so ->kill_sb() is reliably called on unmount, and > generic_shutdown_super() will call sync_filesystem() to flush > everything out before the unmount completes. > > When open/write/close is used to modify files, the final close causes > f_op->flush to be called, which flushes all dirty pages. However if > open/mmap/close/modify-memory/unmap is used, dirty pages can remain > in > memory after the application has dropped all references to the file. > Similarly if a loop-mount is done with a NFS file, there is no final > flush when the loop device is destroyed and the file can have dirty > data after the last "close". > > Also, a loop-back mount of a device does not "close" the file when > the > loop device is destroyed. This can leave dirty page cache pages too. > > Fix this by calling vfs_fsync() in nfs_file_release (aka > f_op->release()). This means that on the final unmap of a file (or > destruction of a loop device), all changes are flushed, and ensures > that > when unmount is requested there will be no dirty pages to delay the > final unmount. > > Without this patch, it is not safe to stop or disconnect the NFS > server after all clients have unmounted. They need to unmount and > call "sync". > > Signed-off-by: NeilBrown <neilb@xxxxxxxx> > --- > fs/nfs/file.c | 6 ++++++ > 1 file changed, 6 insertions(+) > > diff --git a/fs/nfs/file.c b/fs/nfs/file.c > index af330c31f627..aa883d8b24e6 100644 > --- a/fs/nfs/file.c > +++ b/fs/nfs/file.c > @@ -81,6 +81,12 @@ nfs_file_release(struct inode *inode, struct file > *filp) > { > dprintk("NFS: release(%pD2)\n", filp); > > + if (filp->f_mode & FMODE_WRITE) > + /* Ensure dirty mmapped pages are flushed > + * so there will be no dirty pages to > + * prevent an unmount from completing. > + */ > + vfs_fsync(filp, 0); > nfs_inc_stats(inode, NFSIOS_VFSRELEASE); > nfs_file_clear_open_context(filp); > return 0; The right fix here is to ensure that umount() flushes the dirty data, and that it aborts the umount attempt if the flush fails. Otherwise, a signal to the above fsync call will cause the problem to reoccur. -- Trond Myklebust Linux NFS client maintainer, PrimaryData trond.myklebust@xxxxxxxxxxxxxxx ��.n��������+%������w��{.n�����{��w���jg��������ݢj����G�������j:+v���w�m������w�������h�����٥