On Wed, 2017-03-08 at 15:00 -0500, Olga Kornievskaia wrote: > > On Mar 8, 2017, at 2:53 PM, J. Bruce Fields <bfields@xxxxxxxxxxxx> > > wrote: > > > > On Wed, Mar 08, 2017 at 12:32:12PM -0500, Olga Kornievskaia wrote: > > > > > > > On Mar 8, 2017, at 12:25 PM, Christoph Hellwig <hch@infradead.o > > > > rg> > > > > wrote: > > > > > > > > On Wed, Mar 08, 2017 at 12:05:21PM -0500, J. Bruce Fields > > > > wrote: > > > > > Since copy isn't atomic that check is never going to be > > > > > reliable. > > > > > > > > That's true for everything that COPY does. By that logic we > > > > should > > > > not implement it at all (a logic that I'd fully support) > > > > > > If you were to only keep CLONE then you’d lose a huge performance > > > gain > > > you get from server-to-server COPY. > > > > Yes. Also, I think copy-like copy implementations have reasonable > > semantics that are basically the same as read: > > > > - copy can return successfully with less copied than requested. > > - it's fine for the copied range to start and/or end past end > > of > > file, it'll just return a short read. > > - A copy of more than 0 bytes returning 0 means you're at end > > of > > file. > > > > The particular problem here is that that doesn't fit how clone > > works at > > all. > > > > It feels like what happened is that copy_file_range() was made > > mainly > > for the clone case, with the idea that copy might be reluctantly > > accepted as a second-class implementation. Historically? No... Christoph added clone as a valid implementation of copy_file_range() almost a year after Zach and Anna defined the semantics of vfs_copy_file_range(). git blame is your friend... > > > > But the performance gain of copy offload is too big to just ignore, > > and > > in fact it's what copy_file_range does on every filesystem but > > btrfs and > > ocfs2 (and maybe cifs?), so I don't think we can just ignore it. > > > > If we had separate copy_file_range and clone_file_range, I *think* > > it > > could all be made sensible. Am I missing something? > > > > How would the application (cp) know when to call the clone_file_range > and when to call copy_file_range? cp can probably call copy_file_range(), but any application that needs atomic semantics (i.e. a binary operation success/fail) must call clone_file_range(). -- Trond Myklebust Linux NFS client maintainer, PrimaryData trond.myklebust@xxxxxxxxxxxxxxx