On Thu, 2012-05-03 at 19:07 +0200, Niels de Vos wrote: > On 05/03/2012 05:43 PM, Myklebust, Trond wrote: > > On Thu, 2012-05-03 at 17:34 +0200, Niels de Vos wrote: > >> When an application on an NFS-client (tested with NFSv3) executes the > >> following steps, data written after the close() is never flushed to the > >> server: > >> > >> 1. open() > >> 2. mmap() > >> 3. close() > >> 4.<modify data in the mmap'ed area> > >> 5. munmap() > >> > >> Dropping the caches (via /proc/sys/vm/drop_caches) or unmounting does not > >> result in the data being sent to the server. > >> > >> The man-page for mmap (man 2 mmap) does mention that closing the file- > >> descriptor does not munmap() the area. Using the mmap'ed area after a > >> close() sound valid to me (even if it may be bad practice). > >> > >> Investigation and checking showed that the NFS-client does not handle > >> munmap(), and only flushes on close(). To solve this problem, least two > >> solutions can be proposed: > >> > >> a. f_ops->release() is called on munmap() as well as on close(), > >> therefore release() can be used to flush data as well. > >> b. In the 'struct vm_operations_struct' add a .close to the > >> 'struct vm_area_struct' on calling mmap()/nfs_file_mmap() and flush > >> the data in the new close() function. > >> > >> Solution a. contains currently very few code changes: > >> > >> --- a/fs/nfs/inode.c > >> +++ b/fs/nfs/inode.c > >> @@ -713,6 +713,8 @@ int nfs_open(struct inode *inode, struct file *filp) > >> > >> int nfs_release(struct inode *inode, struct file *filp) > >> { > >> + if (S_ISREG(inode->i_mode)&& inode->i_mapping->nrpages != 0) { > >> + nfs_sync_mapping(inode->i_mapping); > >> nfs_file_clear_open_context(filp); > >> return 0; > >> } > >> > >> The disadvantage is, that nfs_release() is called on close() too. That > >> means this causes a flushing of dirty pages, and just after that the > >> nfs_file_clear_open_context() might flush again. The advantage is that > >> it is possible (though not done at the moment) to return an error in > >> case flushing failed. > >> > >> Solution b. does not provide an option to return an error, but does not > >> get called on each close(): > >> > >> --- a/fs/nfs/file.c > >> +++ b/fs/nfs/file.c > >> @@ -547,9 +547,17 @@ out: > >> return ret; > >> } > >> > >> +static void nfs_vm_close(struct vm_area_struct * vma) > >> +{ > >> + struct file *filp = vma->vm_file; > >> + > >> + nfs_file_flush(filp, (fl_owner_t)filp); > >> +} > >> + > >> static const struct vm_operations_struct nfs_file_vm_ops = { > >> .fault = filemap_fault, > >> .page_mkwrite = nfs_vm_page_mkwrite, > >> + .close = nfs_vm_close, > >> }; > >> > >> static int nfs_need_sync_write(struct file *filp, struct inode *inode) > >> > >> I would like some feedback on what solution is most acceptable, or any > >> other suggestions. > > > > Neither solution is acceptable. This isn't a close-to-open cache > > consistency issue. > > > > The syntax of mmap() for both block and NFS mounts is the same: writes > > are not guaranteed to hit the disk until your application explicitly > > calls msync(). > > > > Okay, that makes sense. But if the application never calls msync(), and > just munmap()'s the area, when should the changes be written? I did not > expect that unmounting just disregards the data. That suggests that the VM is failing to dirty the pages on munmap() before releasing the vma->vm_file. If so, then that would be a VM bug... -- Trond Myklebust Linux NFS client maintainer NetApp Trond.Myklebust@xxxxxxxxxx www.netapp.com ��.n��������+%������w��{.n�����{���)��jg��������ݢj����G�������j:+v���w�m������w�������h�����٥