On 09/28/2015 02:40 PM, Darrick J. Wong wrote: > On Fri, Sep 25, 2015 at 04:48:16PM -0400, Anna Schumaker wrote: >> copy_file_range() is a new system call for copying ranges of data >> completely in the kernel. This gives filesystems an opportunity to >> implement some kind of "copy acceleration", such as reflinks or >> server-side-copy (in the case of NFS). >> >> Signed-off-by: Anna Schumaker <Anna.Schumaker@xxxxxxxxxx> >> --- >> v3: >> - Added license information >> - Updated splice(2) >> - Various other edits after mailing list discussion >> --- >> man2/copy_file_range.2 | 211 +++++++++++++++++++++++++++++++++++++++++++++++++ >> man2/splice.2 | 1 + >> 2 files changed, 212 insertions(+) >> create mode 100644 man2/copy_file_range.2 >> >> diff --git a/man2/copy_file_range.2 b/man2/copy_file_range.2 >> new file mode 100644 >> index 0000000..6d66d4a >> --- /dev/null >> +++ b/man2/copy_file_range.2 >> @@ -0,0 +1,211 @@ >> +.\"This manpage is Copyright (C) 2015 Anna Schumaker <Anna.Schumaker@xxxxxxxxxx> >> +.\" >> +.\" %%%LICENSE_START(VERBATIM) >> +.\" Permission is granted to make and distribute verbatim copies of this >> +.\" manual provided the copyright notice and this permission notice are >> +.\" preserved on all copies. >> +.\" >> +.\" Permission is granted to copy and distribute modified versions of >> +.\" this manual under the conditions for verbatim copying, provided that >> +.\" the entire resulting derived work is distributed under the terms of >> +.\" a permission notice identical to this one. >> +.\" >> +.\" Since the Linux kernel and libraries are constantly changing, this >> +.\" manual page may be incorrect or out-of-date. The author(s) assume. >> +.\" no responsibility for errors or omissions, or for damages resulting. >> +.\" from the use of the information contained herein. The author(s) may. >> +.\" not have taken the same level of care in the production of this. >> +.\" manual, which is licensed free of charge, as they might when working. >> +.\" professionally. >> +.\" >> +.\" Formatted or processed versions of this manual, if unaccompanied by >> +.\" the source, must acknowledge the copyright and authors of this work. >> +.\" %%%LICENSE_END >> +.\" >> +.TH COPY 2 2015-08-31 "Linux" "Linux Programmer's Manual" >> +.SH NAME >> +copy_file_range \- Copy a range of data from one file to another >> +.SH SYNOPSIS >> +.nf >> +.B #include <linux/copy.h> >> +.B #include <sys/syscall.h> >> +.B #include <unistd.h> >> + >> +.BI "ssize_t copy_file_range(int " fd_in ", loff_t *" off_in ", int " fd_out ", >> +.BI " loff_t *" off_out ", size_t " len \ >> +", unsigned int " flags ); >> +.fi >> +.SH DESCRIPTION >> +The >> +.BR copy_file_range () >> +system call performs an in-kernel copy between two file descriptors >> +without the additional cost of transferring data from the kernel to userspace >> +and then back into the kernel. >> +It copies up to >> +.I len >> +bytes of data from file descriptor >> +.I fd_in >> +to file descriptor >> +.IR fd_out , >> +overwriting any data that exists within the requested range of the target file. >> + >> +The following semantics apply for >> +.IR off_in , >> +and similar statements apply to >> +.IR off_out : >> +.IP * 3 >> +If >> +.I off_in >> +is NULL, then bytes are read from >> +.I fd_in >> +starting from the current file offset, and the offset is >> +adjusted by the number of bytes copied. >> +.IP * >> +If >> +.I off_in >> +is not NULL, then >> +.I off_in >> +must point to a buffer that specifies the starting >> +offset where bytes from >> +.I fd_in >> +will be read. The current file offset of >> +.I fd_in >> +is not changed, but >> +.I off_in >> +is adjusted appropriately. >> +.PP >> + >> +The >> +.I flags >> +argument can have one of the following flags set: >> +.TP 1.9i >> +.B COPY_FR_COPY >> +Copy all the file data in the requested range. >> +Some filesystems might be able to accelerate this copy >> +to avoid unnecessary data transfers. >> +.TP >> +.B COPY_FR_REFLINK >> +Create a lightweight "reflink", where data is not copied until >> +one of the files is modified. > > .TP > .B COPY_FR_DEDUPE > Create a lightweight "reflink" with the same operational behavior as > COPY_FR_REFLINK, but only perform the reflink if the contents of both files' > byte ranges are identical. This flag cannot be specified with COPY_FR_COPY or > COPY_FR_REFLINK. If the ranges do not match, EILSEQ will be returned. > >> +.PP >> +The default behavior >> +.RI ( flags >> +== 0) is to try creating a reflink, >> +and if reflinking fails >> +.BR copy_file_range () >> +will fall back to performing a full data copy. >> +.SH RETURN VALUE >> +Upon successful completion, >> +.BR copy_file_range () >> +will return the number of bytes copied between files. >> +This could be less than the length originally requested. >> + >> +On error, >> +.BR copy_file_range () >> +returns \-1 and >> +.I errno >> +is set to indicate the error. >> +.SH ERRORS >> +.TP >> +.B EBADF >> +One or more file descriptors are not valid; or >> +.I fd_in >> +is not open for reading; or >> +.I fd_out >> +is not open for writing. >> +.TP >> +.B EINVAL >> +Requested range extends beyond the end of the source file; or the >> +.I flags >> +argument is set to an invalid value. > > .TP > .B EILSEQ > The contents of both files' byte ranges did not match. > >> +.TP >> +.B EIO >> +A low level I/O error occurred while copying. >> +.TP >> +.B ENOMEM >> +Out of memory. >> +.TP >> +.B ENOSPC >> +There is not enough space on the target filesystem to complete the copy. >> +.TP >> +.B EOPNOTSUPP >> +.B COPY_REFLINK > > .B COPY_FR_REFLINK > or > .B COPY_FR_DEDUPE > >> +was specified in >> +.IR flags , >> +but the target filesystem does not support reflinks. > > "does not support the given operation." > > Otherwise you can add, > Reviewed-by: Darrick J. Wong <darrick.wong@xxxxxxxxxx> Thanks, Darrick! I've added in your suggestions (and also renamed COPY_FR_DEDUPE -> COPY_FR_DEDUP). I'll send out a v4 tomorrow morning, once I see if any other comments trickle in. Anna > > --D > >> +.TP >> +.B EXDEV >> +Target filesystem doesn't support cross-filesystem copies. >> +.SH VERSIONS >> +The >> +.BR copy_file_range () >> +system call first appeared in Linux 4.4. >> +.SH CONFORMING TO >> +The >> +.BR copy_file_range () >> +system call is a nonstandard Linux extension. >> +.SH EXAMPLE >> +.nf >> +#define _GNU_SOURCE >> +#include <fcntl.h> >> +#include <linux/copy.h> >> +#include <stdio.h> >> +#include <stdlib.h> >> +#include <sys/stat.h> >> +#include <sys/syscall.h> >> +#include <unistd.h> >> + >> +loff_t copy_file_range(int fd_in, loff_t *off_in, int fd_out, >> + loff_t *off_out, size_t len, unsigned int flags) >> +{ >> + return syscall(__NR_copy_file_range, fd_in, off_in, fd_out, >> + off_out, len, flags); >> +} >> + >> +int main(int argc, char **argv) >> +{ >> + int fd_in, fd_out; >> + struct stat stat; >> + loff_t len, ret; >> + char buf[2]; >> + >> + if (argc != 3) { >> + fprintf(stderr, "Usage: %s <source> <destination>\\n", argv[0]); >> + exit(EXIT_FAILURE); >> + } >> + >> + fd_in = open(argv[1], O_RDONLY); >> + if (fd_in == \-1) { >> + perror("open (argv[1])"); >> + exit(EXIT_FAILURE); >> + } >> + >> + if (fstat(fd_in, &stat) == \-1) { >> + perror("fstat"); >> + exit(EXIT_FAILURE); >> + } >> + len = stat.st_size; >> + >> + fd_out = open(argv[2], O_CREAT|O_WRONLY|O_TRUNC, 0644); >> + if (fd_out == \-1) { >> + perror("open (argv[2])"); >> + exit(EXIT_FAILURE); >> + } >> + >> + do { >> + ret = copy_file_range(fd_in, NULL, fd_out, NULL, len, COPY_FR_COPY); >> + if (ret == \-1) { >> + perror("copy_file_range"); >> + exit(EXIT_FAILURE); >> + } >> + >> + len \-= ret; >> + } while (len > 0); >> + >> + close(fd_in); >> + close(fd_out); >> + exit(EXIT_SUCCESS); >> +} >> +.fi >> +.SH SEE ALSO >> +.BR splice (2) >> diff --git a/man2/splice.2 b/man2/splice.2 >> index b9b4f42..5c162e0 100644 >> --- a/man2/splice.2 >> +++ b/man2/splice.2 >> @@ -238,6 +238,7 @@ only pointers are copied, not the pages of the buffer. >> See >> .BR tee (2). >> .SH SEE ALSO >> +.BR copy_file_range (2), >> .BR sendfile (2), >> .BR tee (2), >> .BR vmsplice (2) >> -- >> 2.5.3 >> >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-api" in >> the body of a message to majordomo@xxxxxxxxxxxxxxx >> More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-api" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html