Re: [RFC v1 01/19] fs: Don't copy beyond the end of the file

Trond Myklebust <trondmy@xxxxxxxxxxxxxxx> · Wed, 8 Mar 2017 20:18:31 +0000

On Wed, 2017-03-08 at 15:00 -0500, Olga Kornievskaia wrote:
> > On Mar 8, 2017, at 2:53 PM, J. Bruce Fields <bfields@xxxxxxxxxxxx>
> > wrote:
> > 
> > On Wed, Mar 08, 2017 at 12:32:12PM -0500, Olga Kornievskaia wrote:
> > > 
> > > > On Mar 8, 2017, at 12:25 PM, Christoph Hellwig <hch@infradead.o
> > > > rg>
> > > > wrote:
> > > > 
> > > > On Wed, Mar 08, 2017 at 12:05:21PM -0500, J. Bruce Fields
> > > > wrote:
> > > > > Since copy isn't atomic that check is never going to be
> > > > > reliable.
> > > > 
> > > > That's true for everything that COPY does.  By that logic we
> > > > should
> > > > not implement it at all (a logic that I'd fully support)
> > > 
> > > If you were to only keep CLONE then you’d lose a huge performance
> > > gain
> > > you get from server-to-server COPY. 
> > 
> > Yes.  Also, I think copy-like copy implementations have reasonable
> > semantics that are basically the same as read:
> > 
> > 	- copy can return successfully with less copied than requested.
> > 	- it's fine for the copied range to start and/or end past end
> > of
> > 	  file, it'll just return a short read.
> > 	- A copy of more than 0 bytes returning 0 means you're at end
> > of
> > 	  file.
> > 
> > The particular problem here is that that doesn't fit how clone
> > works at
> > all.
> > 
> > It feels like what happened is that copy_file_range() was made
> > mainly
> > for the clone case, with the idea that copy might be reluctantly
> > accepted as a second-class implementation.

Historically? No... Christoph added clone as a valid implementation of
copy_file_range() almost a year after Zach and Anna defined the
semantics of vfs_copy_file_range(). git blame is your friend...

> > 
> > But the performance gain of copy offload is too big to just ignore,
> > and
> > in fact it's what copy_file_range does on every filesystem but
> > btrfs and
> > ocfs2 (and maybe cifs?), so I don't think we can just ignore it.
> > 
> > If we had separate copy_file_range and clone_file_range, I *think*
> > it
> > could all be made sensible.  Am I missing something?
> > 
> 
> How would the application (cp) know when to call the clone_file_range
> and when to call copy_file_range?

cp can probably call copy_file_range(), but any application that needs
atomic semantics (i.e. a binary operation success/fail) must call
clone_file_range().

-- 
Trond Myklebust
Linux NFS client maintainer, PrimaryData
trond.myklebust@xxxxxxxxxxxxxxx