On Wed, Dec 18, 2019 at 12:37:52PM -0500, Trond Myklebust wrote: > vfs_clone_file_range() can modify the metadata on the source file too, > so we need to commit that to stable storage as well. > > Reported-by: Dave Chinner <david@xxxxxxxxxxxxx> > Signed-off-by: Trond Myklebust <trond.myklebust@xxxxxxxxxxxxxxx> > --- > fs/nfsd/vfs.c | 21 +++++++++++++++------ > 1 file changed, 15 insertions(+), 6 deletions(-) > > diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c > index f0bca0e87d0c..bc7f330c2a79 100644 > --- a/fs/nfsd/vfs.c > +++ b/fs/nfsd/vfs.c > @@ -280,19 +280,25 @@ nfsd_lookup(struct svc_rqst *rqstp, struct svc_fh *fhp, const char *name, > * Commit metadata changes to stable storage. > */ > static int > -commit_metadata(struct svc_fh *fhp) > +commit_inode_metadata(struct inode *inode) > { > - struct inode *inode = d_inode(fhp->fh_dentry); > const struct export_operations *export_ops = inode->i_sb->s_export_op; > > - if (!EX_ISSYNC(fhp->fh_export)) > - return 0; > - > if (export_ops->commit_metadata) > return export_ops->commit_metadata(inode); > return sync_inode_metadata(inode, 1); > } > > +static int > +commit_metadata(struct svc_fh *fhp) > +{ > + struct inode *inode = d_inode(fhp->fh_dentry); > + > + if (!EX_ISSYNC(fhp->fh_export)) > + return 0; > + return commit_inode_metadata(inode); > +} > + > /* > * Go over the attributes and take care of the small differences between > * NFS semantics and what Linux expects. > @@ -536,7 +542,10 @@ __be32 nfsd4_clone_file_range(struct file *src, u64 src_pos, struct file *dst, > return nfserrno(-EINVAL); > if (sync) { > loff_t dst_end = count ? dst_pos + count - 1 : LLONG_MAX; > - int status = vfs_fsync_range(dst, dst_pos, dst_end, 0); > + int status = commit_inode_metadata(file_inode(src)); > + > + if (!status) > + status = vfs_fsync_range(dst, dst_pos, dst_end, 0); Hmmm. Seeing as the source and destination inode were modified in the same transaction on XFS, we only need one journal write to flush them both. However, commit+fsync ends up doing: journal write (src+dst metadata) writeback data (dst data) -> can dirty dst metadata journal write (dst metadata) or device cache flush (dst data) So either way, we end up having to do multiple device cache flushes and possibly multiple journal writes. IOWs, This would be much more efficient as: fsync(dst) commit_inode_metadata(src) as it would end up as: writeback data (dst data) journal write(src+dst metadata) and the call to commit_inode_metadata(src) ends up being a no-op with almost no overhead... Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx