1.
sendfile() and splice uses temporary buffer in terms of pipe.
do_splice_direct(in.file, &pos, out.file, &out_pos, count, fl) ->
splice_direct_to_actor(struct file *in, struct splice_desc *sd, splice_direct_actor *actor)
http://lxr.free-electrons.com/source/fs/splice.c#L602
default_file_splice_read() allocates a page, read data for file1 from disk to the page, then write the page to a pipe.
default_file_splice_write() reads from the pipe, writes to a page, then write page to the file2.
So this is not a zero-copy in kernel. This can be a zero-copy from userspace point of view as we are not doing copy to userspace. but still a copy is involved a we are doing write to temporary buffer, for example: pipe.
2.
if copy_file_range() is defined for a filesystem operation then splice is not used. otherwise
copy_file_range() uses splice method of temporary buffer in terms of a pipe.
http://lxr.free-electrons.com/source/fs/read_write.c#L1412
ret = file_out->f_op->copy_file_range(file_in, pos_in, file_out,
1413 pos_out, len, flags);
1414 if (ret == -EOPNOTSUPP)
1415 ret = do_splice_direct(file_in, &pos_in, file_out, &pos_out,
1416 len > MAX_RW_COUNT ? MAX_RW_COUNT : len, 0);
1417
copy_file_range() does the following things that is much better method than splice.
COPY_FR_COPY means to copy the data normally, accelerating the work at the filesystem level if possible.
COPY_FR_REFLINK asks for the destination file to refer to the existing copy of the data without actually
copying it. Some filesystems (Btrfs, for example) are able to share references to file blocks in this way.
COPY_FR_DEDUP is like COPY_FR_REFLINK, but it only succeeds if the destination range already contains the same data as the source. The end result is files that look the same as before, but which are now sharing the data on-disk. It is thus a way of removing blocks of duplicated data within the filesystem.
The COPY_FR_COPY operation will, in the absence of filesystem-level acceleration, copy the data directly through the kernel page cache; it is essentially a splice() operation. Copying through the page cache in this way is clearly more efficient than doing the copy in user space, since it avoids the need to copy the data out of the kernel and back in again. If possible, of course, copying with COPY_FR_REFLINK will be the most efficient approach.
copy_file_range() does not do the copy. It does a clone of a range of blocks of a file.
2921 const struct file_operations btrfs_file_operations = {
2922 ...
2935 .copy_file_range = btrfs_copy_file_range,
2936 .clone_file_range = btrfs_clone_file_range,
2937 .dedupe_file_range = btrfs_dedupe_file_range,
2938 };
3902 ssize_t btrfs_copy_file_range(struct file *file_in, loff_t pos_in,
3903 struct file *file_out, loff_t pos_out,
3904 size_t len, unsigned int flags)
3905 {
3906 ssize_t ret;
3907
3908 ret = btrfs_clone_files(file_out, file_in, pos_in, len, pos_out);
3909 if (ret == 0)
3910 ret = len;
3911 return ret;
3912 }
Regardssendfile() and splice uses temporary buffer in terms of pipe.
do_splice_direct(in.file, &pos, out.file, &out_pos, count, fl) ->
splice_direct_to_actor(struct file *in, struct splice_desc *sd, splice_direct_actor *actor)
http://lxr.free-electrons.com/source/fs/splice.c#L602
default_file_splice_read() allocates a page, read data for file1 from disk to the page, then write the page to a pipe.
default_file_splice_write() reads from the pipe, writes to a page, then write page to the file2.
So this is not a zero-copy in kernel. This can be a zero-copy from userspace point of view as we are not doing copy to userspace. but still a copy is involved a we are doing write to temporary buffer, for example: pipe.
2.
if copy_file_range() is defined for a filesystem operation then splice is not used. otherwise
copy_file_range() uses splice method of temporary buffer in terms of a pipe.
http://lxr.free-electrons.com/source/fs/read_write.c#L1412
ret = file_out->f_op->copy_file_range(file_in, pos_in, file_out,
1413 pos_out, len, flags);
1414 if (ret == -EOPNOTSUPP)
1415 ret = do_splice_direct(file_in, &pos_in, file_out, &pos_out,
1416 len > MAX_RW_COUNT ? MAX_RW_COUNT : len, 0);
1417
copy_file_range() does the following things that is much better method than splice.
COPY_FR_COPY means to copy the data normally, accelerating the work at the filesystem level if possible.
COPY_FR_REFLINK asks for the destination file to refer to the existing copy of the data without actually
copying it. Some filesystems (Btrfs, for example) are able to share references to file blocks in this way.
COPY_FR_DEDUP is like COPY_FR_REFLINK, but it only succeeds if the destination range already contains the same data as the source. The end result is files that look the same as before, but which are now sharing the data on-disk. It is thus a way of removing blocks of duplicated data within the filesystem.
The COPY_FR_COPY operation will, in the absence of filesystem-level acceleration, copy the data directly through the kernel page cache; it is essentially a splice() operation. Copying through the page cache in this way is clearly more efficient than doing the copy in user space, since it avoids the need to copy the data out of the kernel and back in again. If possible, of course, copying with COPY_FR_REFLINK will be the most efficient approach.
copy_file_range() does not do the copy. It does a clone of a range of blocks of a file.
2921 const struct file_operations btrfs_file_operations = {
2922 ...
2935 .copy_file_range = btrfs_copy_file_range,
2936 .clone_file_range = btrfs_clone_file_range,
2937 .dedupe_file_range = btrfs_dedupe_file_range,
2938 };
3902 ssize_t btrfs_copy_file_range(struct file *file_in, loff_t pos_in,
3903 struct file *file_out, loff_t pos_out,
3904 size_t len, unsigned int flags)
3905 {
3906 ssize_t ret;
3907
3908 ret = btrfs_clone_files(file_out, file_in, pos_in, len, pos_out);
3909 if (ret == 0)
3910 ret = len;
3911 return ret;
3912 }
_______________________________________________ Kernelnewbies mailing list Kernelnewbies@xxxxxxxxxxxxxxxxx http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies