Re: [RFC v1 01/19] fs: Don't copy beyond the end of the file

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, 2017-03-08 at 15:32 -0500, bfields@xxxxxxxxxxxx wrote:
> On Wed, Mar 08, 2017 at 08:18:31PM +0000, Trond Myklebust wrote:
> > On Wed, 2017-03-08 at 15:00 -0500, Olga Kornievskaia wrote:
> > > > On Mar 8, 2017, at 2:53 PM, J. Bruce Fields <bfields@fieldses.o
> > > > rg>
> > > > wrote:
> > > > 
> > > > On Wed, Mar 08, 2017 at 12:32:12PM -0500, Olga Kornievskaia
> > > > wrote:
> > > > > 
> > > > > > On Mar 8, 2017, at 12:25 PM, Christoph Hellwig <hch@infrade
> > > > > > ad.o
> > > > > > rg>
> > > > > > wrote:
> > > > > > 
> > > > > > On Wed, Mar 08, 2017 at 12:05:21PM -0500, J. Bruce Fields
> > > > > > wrote:
> > > > > > > Since copy isn't atomic that check is never going to be
> > > > > > > reliable.
> > > > > > 
> > > > > > That's true for everything that COPY does.  By that logic
> > > > > > we
> > > > > > should
> > > > > > not implement it at all (a logic that I'd fully support)
> > > > > 
> > > > > If you were to only keep CLONE then you’d lose a huge
> > > > > performance
> > > > > gain
> > > > > you get from server-to-server COPY. 
> > > > 
> > > > Yes.  Also, I think copy-like copy implementations have
> > > > reasonable
> > > > semantics that are basically the same as read:
> > > > 
> > > > 	- copy can return successfully with less copied than
> > > > requested.
> > > > 	- it's fine for the copied range to start and/or end
> > > > past end
> > > > of
> > > > 	  file, it'll just return a short read.
> > > > 	- A copy of more than 0 bytes returning 0 means you're
> > > > at end
> > > > of
> > > > 	  file.
> > > > 
> > > > The particular problem here is that that doesn't fit how clone
> > > > works at
> > > > all.
> > > > 
> > > > It feels like what happened is that copy_file_range() was made
> > > > mainly
> > > > for the clone case, with the idea that copy might be
> > > > reluctantly
> > > > accepted as a second-class implementation.
> > 
> > Historically? No... Christoph added clone as a valid implementation
> > of
> > copy_file_range() almost a year after Zach and Anna defined the
> > semantics of vfs_copy_file_range(). git blame is your friend...
> 
> Yeah, I know.  It still feels to me like the interface was originally
> designed with clone in mind, but that's my vague impression from the
> man
> pages and half-remembered conversations.
> 
> Though the lack of a "just copy the whole file regardless of size"
> case
> is weird for clone.  All you can do is stat the file and then hope it
> doesn't change before you issue the copy_file_range.  But I'd think
> it'd
> be easy for an atomic clone implementation to handle, say, getting a
> snapshot of a log file while it's getting continuously appended to.

It really isn't that interesting in the continuously appended case
(what difference does it make if you only get data from just a few
moments ago), but I can see it being an issue in the case of random
writes where the file size is being extended.

The thing is that in both those cases, the copy_file_range() semantics
are worse, since they don't even guarantee a time-consistent copy.

> > > > But the performance gain of copy offload is too big to just
> > > > ignore,
> > > > and
> > > > in fact it's what copy_file_range does on every filesystem but
> > > > btrfs and
> > > > ocfs2 (and maybe cifs?), so I don't think we can just ignore
> > > > it.
> > > > 
> > > > If we had separate copy_file_range and clone_file_range, I
> > > > *think*
> > > > it
> > > > could all be made sensible.  Am I missing something?
> > > > 
> > > 
> > > How would the application (cp) know when to call the
> > > clone_file_range
> > > and when to call copy_file_range?
> > 
> > cp can probably call copy_file_range(), but any application that
> > needs
> > atomic semantics (i.e. a binary operation success/fail) must call
> > clone_file_range().
> 
> I don't believe there's a clone_file_range().  I see the vfs
> interface,
> but no system call.

There is a standard FICLONERANGE ioctl() that can be used on all
filesystems that support the vfs interface.

> And implementing a simple cp is harder than it should be when you
> don't
> know whether it's implemented as copy or clone.  You have to stat for
> the file size first, retry if you got it wrong, and also retry if you
> get a short read.  The example in the clone_file_range() man page is
> incomplete.

As I said, you shouldn't be using copy_file_range() either in the case
where the file is being modified.

-- 
Trond Myklebust
Linux NFS client maintainer, PrimaryData
trond.myklebust@xxxxxxxxxxxxxxx




[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]
  Powered by Linux