Re: [PATCH v2 10/9] copy_file_range.2: New page documenting copy_file_range()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sun, Sep 13, 2015 at 09:50:18AM +0200, Michael Kerrisk (man-pages) wrote:
> Hi Anna,
> 
> On 09/11/2015 10:30 PM, Anna Schumaker wrote:
> > copy_file_range() is a new system call for copying ranges of data
> > completely in the kernel.  This gives filesystems an opportunity to
> > implement some kind of "copy acceleration", such as reflinks or
> > server-side-copy (in the case of NFS).
> > 
> > Signed-off-by: Anna Schumaker <Anna.Schumaker@xxxxxxxxxx>
> 
> Thanks for writing such a nice first draft man page! I have a few
> comments below. Would you be willing to revise and resubmit?
> 
> > ---
> >  man2/copy_file_range.2 | 188 +++++++++++++++++++++++++++++++++++++++++++++++++
> >  1 file changed, 188 insertions(+)
> >  create mode 100644 man2/copy_file_range.2
> > 
> > diff --git a/man2/copy_file_range.2 b/man2/copy_file_range.2
> > new file mode 100644
> > index 0000000..84912b5
> > --- /dev/null
> > +++ b/man2/copy_file_range.2
> > @@ -0,0 +1,188 @@
> > +.\"This manpage is Copyright (C) 2015 Anna Schumaker <Anna.Schumaker@xxxxxxxxxx>
> 
> We need a license for this page. Please see
> https://www.kernel.org/doc/man-pages/licenses.html
> 
> > +.TH COPY 2 2015-8-31 "Linux" "Linux Programmer's Manual"
> 
> Make the month 2 digits (leading 0).
> 
> > +.SH NAME
> > +copy_file_range \- Copy a range of data from one file to another
> > +.SH SYNOPSIS
> > +.nf
> > +.B #include <linux/copy.h>
> > +.B #include <sys/syscall.h>
> > +.B #include <unistd.h>
> > +
> > +.BI "ssize_t syscall(__NR_copy_file_range, int " fd_in ", loff_t * " off_in ",
> > +.BI "                int " fd_out ", loff_t * " off_out ", size_t " len ",
> > +.BI "                unsigned int " flags );
> 
> Remove spaces following "*" in the lines above. (man-pages convention for code)
> 
> I know that the copy_file_range() (obviously) doesn't yet have a wrapper
> in glibc, but in the man pages we document all system calls as though there
> is a wrapper. See, for example, gettid(2), for an axample of how this is done
> (a note in the SYNOPSIS and then some further details in NOTES).
> 
> > +.fi
> > +.SH DESCRIPTION
> > +The
> > +.BR copy_file_range ()
> > +system call performs an in-kernel copy between two file descriptors
> > +without all that tedious mucking about in userspace.
> 
> I'd write that last piece as:
> 
> "without the cost of (a loop) transferring data from the kernel to a 
> user-space buffer and then back to the kernel again.
> 
> > +It copies up to
> > +.I len
> > +bytes of data from file descriptor
> > +.I fd_in
> > +to file descriptor
> > +.IR fd_out ,
> > +overwriting any data that exists within the requested range.
> 
> s/.$/ of the target file./
> 
> > +
> > +The following semantics apply for
> > +.IR off_in ,
> > +and similar statements apply to
> > +.IR off_out :
> > +.IP * 3
> > +If
> > +.I off_in
> > +is NULL, then bytes are read from
> > +.I fd_in
> > +starting from the current file offset and the current
> > +file offset is adjusted appropriately.
> > +.IP *
> > +If
> > +.I off_in
> > +is not NULL, then
> > +.I off_in
> > +must point to a buffer that specifies the starting
> > +offset where bytes from
> > +.I fd_in
> > +will be read.  The current file offset of
> > +.I fd_in
> > +is not changed, but
> > +.I off_in
> > +is adjusted appropriately.
> > +.PP
> > +
> > +The
> > +.I flags
> > +argument is a bit mask composed by OR-ing together zero
> > +or more of the following flags:
> > +.TP 1.9i
> > +.B COPY_FR_COPY
> > +Copy all the file data in the requested range.
> > +Some filesystems, like NFS, might be able to accelerate this copy
> > +to avoid unnecessary data transfers.
> > +.TP
> > +.B COPY_FR_REFLINK
> > +Create a lightweight "reflink", where data is not copied until
> > +one of the files is modified.
> > +.PP
> > +The default behavior
> > +.RI ( flags
> > +== 0) is to try creating a reflink,
> > +and if reflinking fails
> > +.BR copy_file_range ()
> > +will fall back on performing a full data copy.
> 
> s/back on/back to/
> 
> > +This is equivalent to setting
> > +.I flags
> > +equal to
> > +.RB ( COPY_FR_COPY | COPY_FR_REFLINK ).
> 
> So, from an API deign perspective, the interoperation of these two
> flags appears odd. Bit flags are commonly (not always) orthogonal.
> I think here it could be pointed out a little more explicitly that
> these two flags are not orthogonal. In particular, perhaps after the
> last sentence, you could add another sentence:
> 
> [[
> (This contrasts with specifying
> .I flags
> as just
> .BR COPY_FR_REFLINK ,
> which causes the call to create a reflink,
> and fail if that is not possible,
> rather than falling back to a full data copy.)
> ]]
> 
> Furthermore, I even wonder if explicitly specifying flags as
> COPY_FR_COPY | COPY_FR_REFLINK should just generate an EINVAL
> error. 0 already gives us the behavior described above,
> and allowing the combination COPY_FR_COPY | COPY_FR_REFLINK
> perhaps just contributes to misleading the user that these
> flags are orthogonal, when in reality they are not. What do
> you think?

Personally, I think it's a little weird that one turns on reflink with a flag;
turns on regular copy with a different flag; and turns on both by not
specifying either flag. :)

> What are the semantics with respect to signals, especially if data 
> copying a very large file range? This info should be included in the 
> man page, probably under NOTES.
> 
> > +.SH RETURN VALUE
> > +Upon successful completion,
> > +.BR copy_file_range ()
> > +will return the number of bytes copied between files.
> > +This could be less than the length originally requested.
> > +
> > +On error,
> > +.BR copy_file_range ()
> > +returns \-1 and
> > +.I errno
> > +is set to indicate the error.
> > +.SH ERRORS
> > +.TP
> > +.B EBADF
> > +One or more file descriptors are not valid,
> > +or do not have proper read-write mode;
> 
> I think that last line can go. I mean, isn't this point (again)
> covered in the next few lines?

Or change the ';' to a ':'?

> > +.I fd_in
> > +is not open for reading; or
> > +.I fd_out
> > +is not open for writing.
> > +.TP
> > +.B EINVAL
> > +Requested range extends beyond the end of the file; or the
> 
> s/file/source file/  (right?)
>
> > +.I flags
> > +argument is set to an invalid value.
> > +.TP
> > +.B EIO
> > +A low level I/O error occurred while copying.
> > +.TP
> > +.B ENOMEM
> > +Out of memory.
> > +.TP
> > +.B ENOSPC
> > +There is not enough space to complete the copy.
> 
> Space where? On the filesystem?
> => s/space/space on the target filesystem/
> 
> > +.TP
> > +.B EOPNOTSUPP
> > +.B COPY_REFLINK
> > +was specified in
> > +.IR flags ,
> > +but the target filesystem does not support reflinks.
> > +.TP
> > +.B EXDEV
> > +Target filesystem doesn't support cross-filesystem copies.
> 
> I'm curious. What are some examples of filesystems that produce this
> error?

btrfs and xfs (and probably most local filesystems) don't support cross-fs
copies.

--D

> 
> > +.SH VERSIONS
> > +The
> > +.BR copy_file_range ()
> > +system call first appeared in Linux 4.4.
> > +.SH CONFORMING TO
> > +The
> > +.BR copy_file_range ()
> > +system call is a nonstandard Linux extension.
> > +.SH EXAMPLE
> > +.nf
> > +
> > +#define _GNU_SOURCE
> > +#include <fcntl.h>
> > +#include <linux/copy.h>
> > +#include <stdio.h>
> > +#include <stdlib.h>
> > +#include <sys/stat.h>
> > +#include <sys/syscall.h>
> > +#include <unistd.h>
> > +
> > +
> > +int main(int argc, char **argv)
> > +{
> > +    int fd_in, fd_out;
> > +    struct stat stat;
> > +    loff_t len, ret;
> > +
> > +    if (argc != 3) {
> > +        fprintf(stderr, "Usage: %s <source> <destination>\\n", argv[0]);
> > +        exit(EXIT_FAILURE);
> > +    }
> > +
> > +    fd_in = open(argv[1], O_RDONLY);
> > +    if (fd_in == -1) {
> 
> Please replace all "-" in code by "\-". (This is a groff
> detail.)
> 
> > +        perror("open (argv[1])");
> > +        exit(EXIT_FAILURE);
> > +    }
> > +
> > +    if (fstat(fd_in, &stat) == -1) {
> > +        perror("fstat");
> > +        exit(EXIT_FAILURE);
> > +    }
> > +    len = stat.st_size;
> > +
> > +    fd_out = creat(argv[2], 0644);
> 
> These days, I think we should discourage the use of creat(). Maybe 
> better to use open() instead here?
> 
> > +    if (fd_out == -1) {
> > +        perror("creat (argv[2])");
> > +        exit(EXIT_FAILURE);
> > +    }
> > +
> > +    do {
> > +        ret = syscall(__NR_copy_file_range, fd_in, NULL,
> > +                      fd_out, NULL, len, 0);
> 
> I'd rather see this as:
> 
> int
> copy_file_range(int fd_in, loff_t *off_in,
>                 int fd_out, loff_t *off_out, size_t len,
>                 unsigned int flags)
> {
>     return(syscall(__NR_copy_file_range, fd_in, fd_out, off_out, len, flags);
> }
> 
> ...
> 
>     copy_file_range(fd_in, fd_out, off_out, len, flags);
> 
>  
> > +        if (ret == -1) {
> > +            perror("copy_file_range");
> > +            exit(EXIT_FAILURE);
> > +        }
> > +
> > +        len -= ret;
> > +    } while (len > 0);
> > +
> > +    close(fd_in);
> > +    close(fd_out);
> > +    exit(EXIT_SUCCESS);
> > +}
> > +.fi
> > +.SH SEE ALSO
> > +.BR splice (2)
> > 
> 
> In the next iteration of this patch, could you include a change to
> splice(2) so that copy_file_range(2) is added under SEE ALSO in
> that page. Also, are there any other pages that we should like
> to/from? (sendfile(2)?)
> 
> Thanks,
> 
> Michael
> 
> 
> -- 
> Michael Kerrisk
> Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
> Linux/UNIX System Programming Training: http://man7.org/training/
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]
  Powered by Linux