On Sun, Sep 13, 2015 at 09:50:18AM +0200, Michael Kerrisk (man-pages) wrote: > Hi Anna, > > On 09/11/2015 10:30 PM, Anna Schumaker wrote: > > copy_file_range() is a new system call for copying ranges of data > > completely in the kernel. This gives filesystems an opportunity to > > implement some kind of "copy acceleration", such as reflinks or > > server-side-copy (in the case of NFS). > > > > Signed-off-by: Anna Schumaker <Anna.Schumaker@xxxxxxxxxx> > > Thanks for writing such a nice first draft man page! I have a few > comments below. Would you be willing to revise and resubmit? > > > --- > > man2/copy_file_range.2 | 188 +++++++++++++++++++++++++++++++++++++++++++++++++ > > 1 file changed, 188 insertions(+) > > create mode 100644 man2/copy_file_range.2 > > > > diff --git a/man2/copy_file_range.2 b/man2/copy_file_range.2 > > new file mode 100644 > > index 0000000..84912b5 > > --- /dev/null > > +++ b/man2/copy_file_range.2 > > @@ -0,0 +1,188 @@ > > +.\"This manpage is Copyright (C) 2015 Anna Schumaker <Anna.Schumaker@xxxxxxxxxx> > > We need a license for this page. Please see > https://www.kernel.org/doc/man-pages/licenses.html > > > +.TH COPY 2 2015-8-31 "Linux" "Linux Programmer's Manual" > > Make the month 2 digits (leading 0). > > > +.SH NAME > > +copy_file_range \- Copy a range of data from one file to another > > +.SH SYNOPSIS > > +.nf > > +.B #include <linux/copy.h> > > +.B #include <sys/syscall.h> > > +.B #include <unistd.h> > > + > > +.BI "ssize_t syscall(__NR_copy_file_range, int " fd_in ", loff_t * " off_in ", > > +.BI " int " fd_out ", loff_t * " off_out ", size_t " len ", > > +.BI " unsigned int " flags ); > > Remove spaces following "*" in the lines above. (man-pages convention for code) > > I know that the copy_file_range() (obviously) doesn't yet have a wrapper > in glibc, but in the man pages we document all system calls as though there > is a wrapper. See, for example, gettid(2), for an axample of how this is done > (a note in the SYNOPSIS and then some further details in NOTES). > > > +.fi > > +.SH DESCRIPTION > > +The > > +.BR copy_file_range () > > +system call performs an in-kernel copy between two file descriptors > > +without all that tedious mucking about in userspace. > > I'd write that last piece as: > > "without the cost of (a loop) transferring data from the kernel to a > user-space buffer and then back to the kernel again. > > > +It copies up to > > +.I len > > +bytes of data from file descriptor > > +.I fd_in > > +to file descriptor > > +.IR fd_out , > > +overwriting any data that exists within the requested range. > > s/.$/ of the target file./ > > > + > > +The following semantics apply for > > +.IR off_in , > > +and similar statements apply to > > +.IR off_out : > > +.IP * 3 > > +If > > +.I off_in > > +is NULL, then bytes are read from > > +.I fd_in > > +starting from the current file offset and the current > > +file offset is adjusted appropriately. > > +.IP * > > +If > > +.I off_in > > +is not NULL, then > > +.I off_in > > +must point to a buffer that specifies the starting > > +offset where bytes from > > +.I fd_in > > +will be read. The current file offset of > > +.I fd_in > > +is not changed, but > > +.I off_in > > +is adjusted appropriately. > > +.PP > > + > > +The > > +.I flags > > +argument is a bit mask composed by OR-ing together zero > > +or more of the following flags: > > +.TP 1.9i > > +.B COPY_FR_COPY > > +Copy all the file data in the requested range. > > +Some filesystems, like NFS, might be able to accelerate this copy > > +to avoid unnecessary data transfers. > > +.TP > > +.B COPY_FR_REFLINK > > +Create a lightweight "reflink", where data is not copied until > > +one of the files is modified. > > +.PP > > +The default behavior > > +.RI ( flags > > +== 0) is to try creating a reflink, > > +and if reflinking fails > > +.BR copy_file_range () > > +will fall back on performing a full data copy. > > s/back on/back to/ > > > +This is equivalent to setting > > +.I flags > > +equal to > > +.RB ( COPY_FR_COPY | COPY_FR_REFLINK ). > > So, from an API deign perspective, the interoperation of these two > flags appears odd. Bit flags are commonly (not always) orthogonal. > I think here it could be pointed out a little more explicitly that > these two flags are not orthogonal. In particular, perhaps after the > last sentence, you could add another sentence: > > [[ > (This contrasts with specifying > .I flags > as just > .BR COPY_FR_REFLINK , > which causes the call to create a reflink, > and fail if that is not possible, > rather than falling back to a full data copy.) > ]] > > Furthermore, I even wonder if explicitly specifying flags as > COPY_FR_COPY | COPY_FR_REFLINK should just generate an EINVAL > error. 0 already gives us the behavior described above, > and allowing the combination COPY_FR_COPY | COPY_FR_REFLINK > perhaps just contributes to misleading the user that these > flags are orthogonal, when in reality they are not. What do > you think? Personally, I think it's a little weird that one turns on reflink with a flag; turns on regular copy with a different flag; and turns on both by not specifying either flag. :) > What are the semantics with respect to signals, especially if data > copying a very large file range? This info should be included in the > man page, probably under NOTES. > > > +.SH RETURN VALUE > > +Upon successful completion, > > +.BR copy_file_range () > > +will return the number of bytes copied between files. > > +This could be less than the length originally requested. > > + > > +On error, > > +.BR copy_file_range () > > +returns \-1 and > > +.I errno > > +is set to indicate the error. > > +.SH ERRORS > > +.TP > > +.B EBADF > > +One or more file descriptors are not valid, > > +or do not have proper read-write mode; > > I think that last line can go. I mean, isn't this point (again) > covered in the next few lines? Or change the ';' to a ':'? > > +.I fd_in > > +is not open for reading; or > > +.I fd_out > > +is not open for writing. > > +.TP > > +.B EINVAL > > +Requested range extends beyond the end of the file; or the > > s/file/source file/ (right?) > > > +.I flags > > +argument is set to an invalid value. > > +.TP > > +.B EIO > > +A low level I/O error occurred while copying. > > +.TP > > +.B ENOMEM > > +Out of memory. > > +.TP > > +.B ENOSPC > > +There is not enough space to complete the copy. > > Space where? On the filesystem? > => s/space/space on the target filesystem/ > > > +.TP > > +.B EOPNOTSUPP > > +.B COPY_REFLINK > > +was specified in > > +.IR flags , > > +but the target filesystem does not support reflinks. > > +.TP > > +.B EXDEV > > +Target filesystem doesn't support cross-filesystem copies. > > I'm curious. What are some examples of filesystems that produce this > error? btrfs and xfs (and probably most local filesystems) don't support cross-fs copies. --D > > > +.SH VERSIONS > > +The > > +.BR copy_file_range () > > +system call first appeared in Linux 4.4. > > +.SH CONFORMING TO > > +The > > +.BR copy_file_range () > > +system call is a nonstandard Linux extension. > > +.SH EXAMPLE > > +.nf > > + > > +#define _GNU_SOURCE > > +#include <fcntl.h> > > +#include <linux/copy.h> > > +#include <stdio.h> > > +#include <stdlib.h> > > +#include <sys/stat.h> > > +#include <sys/syscall.h> > > +#include <unistd.h> > > + > > + > > +int main(int argc, char **argv) > > +{ > > + int fd_in, fd_out; > > + struct stat stat; > > + loff_t len, ret; > > + > > + if (argc != 3) { > > + fprintf(stderr, "Usage: %s <source> <destination>\\n", argv[0]); > > + exit(EXIT_FAILURE); > > + } > > + > > + fd_in = open(argv[1], O_RDONLY); > > + if (fd_in == -1) { > > Please replace all "-" in code by "\-". (This is a groff > detail.) > > > + perror("open (argv[1])"); > > + exit(EXIT_FAILURE); > > + } > > + > > + if (fstat(fd_in, &stat) == -1) { > > + perror("fstat"); > > + exit(EXIT_FAILURE); > > + } > > + len = stat.st_size; > > + > > + fd_out = creat(argv[2], 0644); > > These days, I think we should discourage the use of creat(). Maybe > better to use open() instead here? > > > + if (fd_out == -1) { > > + perror("creat (argv[2])"); > > + exit(EXIT_FAILURE); > > + } > > + > > + do { > > + ret = syscall(__NR_copy_file_range, fd_in, NULL, > > + fd_out, NULL, len, 0); > > I'd rather see this as: > > int > copy_file_range(int fd_in, loff_t *off_in, > int fd_out, loff_t *off_out, size_t len, > unsigned int flags) > { > return(syscall(__NR_copy_file_range, fd_in, fd_out, off_out, len, flags); > } > > ... > > copy_file_range(fd_in, fd_out, off_out, len, flags); > > > > + if (ret == -1) { > > + perror("copy_file_range"); > > + exit(EXIT_FAILURE); > > + } > > + > > + len -= ret; > > + } while (len > 0); > > + > > + close(fd_in); > > + close(fd_out); > > + exit(EXIT_SUCCESS); > > +} > > +.fi > > +.SH SEE ALSO > > +.BR splice (2) > > > > In the next iteration of this patch, could you include a change to > splice(2) so that copy_file_range(2) is added under SEE ALSO in > that page. Also, are there any other pages that we should like > to/from? (sendfile(2)?) > > Thanks, > > Michael > > > -- > Michael Kerrisk > Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/ > Linux/UNIX System Programming Training: http://man7.org/training/ -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html