On Wed, 19 Jul 2023 at 16:20, Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote: > > If you want "one-copy", what you can do is: > > - mmap() the file data (zero copy, not stable yet) > > - use "write()" to write the data to the network. This will copy it > to the skbs before the write() call returns and that copy makes it > stable. > > Alternatively, if you want to be more than a bit odd, you _can_ do the > zero-copy on the write side, by doing > > - read the file data (one copy, now it's stable) > > - vmsplice() to the kernel buffer (zero copy) > > - splice() to the network (zero copy at least for the good cases) Actually, I guess technically there's a third way: - mmap the input (zero copy) - write() to a pipe (one copy) - splice() to the network (zero copy) which doesn't seem to really have any sane use cases, but who knows... It avoids the user buffer management of the vmsplice() model, and while you cannot do anything to the data in user space *before* it is stable (because it only becomes stable as it is copied to the pipe buffers by the 'write()' system call), you could use "tee()" to duplicate the now stable stream and perhaps log it or create a checksum after-the-fact. Another use-case would be if you want to send the *same* stable stream to two different network connections, while still only having one copy. You can't do that with plain splice() - because the data isn't guaranteed to be stable, and the two network connections might see different streams. You can't do that with the 'mmap and then write-to-socket' approach, because the two writes not only copy twice, they might copy different data. And while you *can* do it with the "read+vmsplice()" approach, maybe the "write to pipe() in order to avoid any user space buffer issues" model is better. And "tee()" avoids the overhead of doing multiple vmsplice() calls on the same buffer. I dunno. What I *am* trying to say is that "splice()" is actually kind of designed for people to do these kinds of combinations. But very very few people actually do it. For example, the "tee()" system call exists, but it is crazy hard to use, I'm not sure it has ever actually been used for anything real. Linus