Re: [PATCH 4/8] NFS: flush out dirty data on file fput().

NeilBrown <neilb@xxxxxxxx> · Sat, 19 Aug 2017 11:02:11 +1000

On Fri, Aug 18 2017, Trond Myklebust wrote:

> On Fri, 2017-08-18 at 17:12 +1000, NeilBrown wrote:
>> Any dirty NFS page holds an s_active reference on the superblock,
>> because page_private() references an nfs_page, which references an
>> open context, which references the superblock.
>> 
>> So if there are any dirty pages when the filesystem is unmounted, the
>> unmount will act like a "lazy" unmount and not call ->kill_sb().
>> Background write-back can then write out the pages *after* the
>> filesystem
>> unmount has apparently completed.
>> 
>> This contrasts with other filesystems which do not hold extra
>> s_active
>> references, so ->kill_sb() is reliably called on unmount, and
>> generic_shutdown_super() will call sync_filesystem() to flush
>> everything out before the unmount completes.
>> 
>> When open/write/close is used to modify files, the final close causes
>> f_op->flush to be called, which flushes all dirty pages.  However if
>> open/mmap/close/modify-memory/unmap is used, dirty pages can remain
>> in
>> memory after the application has dropped all references to the file.
>> Similarly if a loop-mount is done with a NFS file, there is no final
>> flush when the loop device is destroyed and the file can have dirty
>> data after the last "close".
>> 
>> Also, a loop-back mount of a device does not "close" the file when
>> the
>> loop device is destroyed.  This can leave dirty page cache pages too.
>> 
>> Fix this by calling vfs_fsync() in nfs_file_release (aka
>> f_op->release()).  This means that on the final unmap of a file (or
>> destruction of a loop device), all changes are flushed, and ensures
>> that
>> when unmount is requested there will be no dirty pages to delay the
>> final unmount.
>> 
>> Without this patch, it is not safe to stop or disconnect the NFS
>> server after all clients have unmounted.  They need to unmount and
>> call "sync".
>> 
>> Signed-off-by: NeilBrown <neilb@xxxxxxxx>
>> ---
>>  fs/nfs/file.c |    6 ++++++
>>  1 file changed, 6 insertions(+)
>> 
>> diff --git a/fs/nfs/file.c b/fs/nfs/file.c
>> index af330c31f627..aa883d8b24e6 100644
>> --- a/fs/nfs/file.c
>> +++ b/fs/nfs/file.c
>> @@ -81,6 +81,12 @@ nfs_file_release(struct inode *inode, struct file
>> *filp)
>>  {
>>  	dprintk("NFS: release(%pD2)\n", filp);
>>  
>> +	if (filp->f_mode & FMODE_WRITE)
>> +		/* Ensure dirty mmapped pages are flushed
>> +		 * so there will be no dirty pages to
>> +		 * prevent an unmount from completing.
>> +		 */
>> +		vfs_fsync(filp, 0);
>>  	nfs_inc_stats(inode, NFSIOS_VFSRELEASE);
>>  	nfs_file_clear_open_context(filp);
>>  	return 0;
>
> The right fix here is to ensure that umount() flushes the dirty data,
> and that it aborts the umount attempt if the flush fails. Otherwise, a
> signal to the above fsync call will cause the problem to reoccur.

The only way to ensure that umount flushes dirty data is to revert
commit 1daef0a86837 ("NFS: Clean up nfs_sb_active/nfs_sb_deactive").
As long as we are taking extra references to s_active, umount won't do
what it ought to do.

I do wonder what we should do when you try to unmount a filesystem
mounted from an inaccessible server.
Blocking is awkward for scripts to work with.  Failing with EBUSY might
be misleading.
I don't think that using MNT_FORCE guarantees success does it?  It would
need to discard any dirty data and abort any other background tasks.

I think I would be in favor of EBUSY rather than a hang, and of making
MNT_FORCE a silver bullet providing there are no open file descriptors
on the filesystem.

Thanks,
NeilBrown
Attachment:
signature.asc

Description: PGP signature