Greetings: Welcome to the RFC. Currently, when a user app uses sendfile the user app has no way to know if the bytes were transmit; sendfile simply returns, but it is possible that a slow client on the other side may take time to receive and ACK the bytes. In the meantime, the user app which called sendfile has no way to know whether it can overwrite the data on disk that it just sendfile'd. One way to fix this is to add zerocopy notifications to sendfile similar to how MSG_ZEROCOPY works with sendmsg. This is possible thanks to the extensive work done by Pavel [1]. To support this, two important user ABI changes are proposed: - A new splice flag, SPLICE_F_ZC, which allows users to signal that splice should generate zerocopy notifications if possible. - A new system call, sendfile2, which is similar to sendfile64 except that it takes an additional argument, flags, which allows the user to specify either a "regular" sendfile or a sendfile with zerocopy notifications enabled. In either case, user apps can read notifications from the error queue (like they would with MSG_ZEROCOPY) to determine when their call to sendfile has completed. I tested this RFC using the selftest modified in the last patch and also by using the selftest between two different physical hosts: # server ./msg_zerocopy -4 -i eth0 -t 2 -v -r tcp # client (does the sendfiling) dd if=/dev/zero of=sendfile_data bs=1M count=8 ./msg_zerocopy -4 -i eth0 -D $SERVER_IP -v -l 1 -t 2 -z -f sendfile_data tcp I would love to get high level feedback from folks on a few things: - Is this functionality, at a high level, something that would be desirable / useful? I think so, but I'm of course I am biased ;) - Is this approach generally headed in the right direction? Are the proposed user ABI changes reasonable? If the above two points are generally agreed upon then I'd welcome feedback on the patches themselves :) This is kind of a net thing, but also kind of a splice thing so hope I am sending this to right places to get appropriate feedback. I based my code on the vfs/for-next tree, but am happy to rebase on another tree if desired. The cc-list got a little out of control, so I manually trimmed it down quite a bit; sorry if I missed anyone I should have CC'd in the process. Thanks, Joe [1]: https://lore.kernel.org/netdev/cover.1657643355.git.asml.silence@xxxxxxxxx/ Joe Damato (10): splice: Add ubuf_info to prepare for ZC splice: Add helper that passes through splice_desc splice: Factor splice_socket into a helper splice: Add SPLICE_F_ZC and attach ubuf fs: Add splice_write_sd to file operations fs: Extend do_sendfile to take a flags argument fs: Add sendfile2 which accepts a flags argument fs: Add sendfile flags for sendfile2 fs: Add sendfile2 syscall selftests: Add sendfile zerocopy notification test arch/alpha/kernel/syscalls/syscall.tbl | 1 + arch/arm/tools/syscall.tbl | 1 + arch/arm64/tools/syscall_32.tbl | 1 + arch/m68k/kernel/syscalls/syscall.tbl | 1 + arch/microblaze/kernel/syscalls/syscall.tbl | 1 + arch/mips/kernel/syscalls/syscall_n32.tbl | 1 + arch/mips/kernel/syscalls/syscall_n64.tbl | 1 + arch/mips/kernel/syscalls/syscall_o32.tbl | 1 + arch/parisc/kernel/syscalls/syscall.tbl | 1 + arch/powerpc/kernel/syscalls/syscall.tbl | 1 + arch/s390/kernel/syscalls/syscall.tbl | 1 + arch/sh/kernel/syscalls/syscall.tbl | 1 + arch/sparc/kernel/syscalls/syscall.tbl | 1 + arch/x86/entry/syscalls/syscall_32.tbl | 1 + arch/x86/entry/syscalls/syscall_64.tbl | 1 + arch/xtensa/kernel/syscalls/syscall.tbl | 1 + fs/read_write.c | 40 +++++++--- fs/splice.c | 87 +++++++++++++++++---- include/linux/fs.h | 2 + include/linux/sendfile.h | 10 +++ include/linux/splice.h | 7 +- include/linux/syscalls.h | 2 + include/uapi/asm-generic/unistd.h | 4 +- net/socket.c | 1 + scripts/syscall.tbl | 1 + tools/testing/selftests/net/msg_zerocopy.c | 54 ++++++++++++- tools/testing/selftests/net/msg_zerocopy.sh | 5 ++ 27 files changed, 200 insertions(+), 29 deletions(-) create mode 100644 include/linux/sendfile.h base-commit: 2e72b1e0aac24a12f3bf3eec620efaca7ab7d4de -- 2.43.0