On Thu, 29 Jun 2023 at 11:34, Matthew Wilcox <willy@xxxxxxxxxxxxx> wrote: > > I think David muddied the waters by talking about vmsplice(). The > problem encountered is with splice() from the page cache. Reading > the documentation, > > splice() moves data between two file descriptors without copying be‐ > tween kernel address space and user address space. It transfers up to > len bytes of data from the file descriptor fd_in to the file descriptor > fd_out, where one of the file descriptors must refer to a pipe. Well, the original intent really always was that it's about zero-copy. So I do think that the answer to your test-program is that yes, it really traditionally *should* output "new". A splice from a file acts like a scatter-gather mmap() in the kernel. It's the original intent, and it's the whole reason it's noticeably faster than doing a write. Now, do I then agree that splice() has turned out to be a nasty morass of problems? Yes. And I even agree that while I actually *think* that your test program should output "new" (because that is the whole point of the exercise), it also means that people who use splice() need to *understand* that, and it's much too easy to get things wrong if you don't understand that the whole point of splice is to act as a kind of ad-hoc in-kernel mmap thing. And to make matters worse, for mmap() we actually do have some coherency helpers. For splice(), the page ref stays around. It's kind of like GUP and page pinning - another area where we have had lots of problems and lots of nasty semantics and complications with other VM operations over the years. So I really *really* don't want to complicate splice() even more to give it some new semantics that it has never ever really had, because people didn't understand it and used it wrong. Quite the reverse. I'd be willing to *simplify* splice() by just saying "it was all a mistake", and just turning it into wrappers around read/write. But those patches would have to be radical simplifications, not adding yet more crud on top of the pain that is splice(). Because it will hurt performance. And I'm ok with that as long as it comes with huge simplifications. What I'm *not* ok with is "I mis-used splice, now I want splice to act differently, so let's make it even more complicated". Linus