On Thu, 29 Jun 2023 at 11:19, Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote: > > Now, we also have SPLICE_F_GIFT. [..] > > Now, I would actually not disagree with removing that part. It's > scary. But I think we don't really have any users (ok, fuse and some > random console driver?) Side note: maybe I should clarify. I have grown to pretty much hate splice() over the years, just because it's been a constant source of sorrow in so many ways. So I'd personally be perfectly ok with just making vmsplice() be exactly the same as write, and turn all of vmsplice() into just "it's a read() if the pipe is open for read, and a write if it's open for writing". IOW, effectively get rid of vmsplice() entirely, just leaving it as a legacy name for an interface. What I *absolutely* don't want to see is to make vmsplice() even more complicated, and actively slower in the process. Unmapping it from the source, removing it from the VM, is all just crazy talk. If you want to be really crazy, I can tell you how to make for some truly stupendously great benchmarks: make a plain "write()" system call look up the physical page, check if it's COW'able, and if so, mark it read-only in the source and steal the page. Now write() has taken a snapshot of the source, and can use that page for the pipe buffer as-is. It won't change, because if the user writes to it, the user will just take a page fault and force a COW. Then, to complete the thing, make 'read()' of a pipe able to just take the page, and insert it into the destination VM (it's ok to make it writable at that point). You can get *wonderful* performance numbers from benchmarks with that. I know, because I did exactly that long long ago. So long ago that I think I had a i486 that had memory throughput measured in megabytes. And my pipe throughput benchmark got gigabytes per second! Of course, that benchmark relied entirely on the source of the write() never actually writing to the page, and the reader never actually bothering to touch the page. So it was gigabytes on a pretty bad benchmark. But it was quite impressive. I don't think those patches ever got posted publicly, because while very impressive on benchmarks, it obviously was absolutely horrendous in real life, because in real life the source of the pipe data would (a) not usually be page-aligned anyway, and (b) even if it was and triggered this wonderful case, it would then re-use the buffer and take a COW fault, and now the overhead of faulting, allocating a new page, copying said page, was obviously higher than just doing all that in the pipe write() code without any faulting overhead. But splice() (and vmsplice()) does conceptually come from that kind of background. It's just that it was never as lovely and as useful as it promised to be. So I'd actually be more than happy to just say "let's decommission splice entirely, just keeping the interfaces alive for backwards compatibility" Linus