On Sat, Nov 2, 2019 at 4:10 PM Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote: > > On Sat, Nov 2, 2019 at 4:02 PM Linus Torvalds > <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote: > > > > But I don't think anybody actually _did_ any of that. But that's > > basically the argument for the three splice operations: > > write/vmsplice/splice(). Which one you use depends on the lifetime and > > the source of your data. write() is obviously for the copy case (the > > source data might not be stable), while splice() is for the "data from > > another source", and vmsplace() is "data is from stable data in my > > vm". > > Btw, it's really worth noting that "splice()" and friends are from a > more happy-go-lucky time when we were experimenting with new > interfaces, and in a day and age when people thought that interfaces > like "sendpage()" and zero-copy and playing games with the VM was a > great thing to do. I suppose a nicer interface might be: madvise(buf, len, MADV_STABILIZE); (MADV_STABILIZE is an imaginary operation that write protects the memory a la fork() but without the copying part.) vmsplice_safer(fd, ...); Where vmsplice_safer() is like vmsplice, except that it only works on write-protected pages. If you vmsplice_safer() some memory and then write to the memory, the pipe keeps the old copy. But this can all be done with memfd and splice, too, I think. > > It turns out that VM games are almost always more expensive than just > copying the data in the first place, but hey, people didn't know that, > and zero-copy was seen a big deal. > > The reality is that almost nobody uses splice and vmsplice at all, and > they have been a much bigger headache than they are worth. If I could > go back in time and not do them, I would. But there have been a few > very special uses that seem to actually like the interfaces. > > But it's entirely possible that we should kill vmsplice() (likely by > just implementing the semantics as "write()") because it's not common > enough to have the complexity. I think this is the right choice. FWIW, the openssl vmsplice() call looks dubious, but I suspect it's okay because it's vmsplicing to a netlink socket, and the kernel code on the other end won't read the data after it returns a response. --Andy