Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote: > > Quite the reverse. I'd be willing to *simplify* splice() by just > saying "it was all a mistake", and just turning it into wrappers > around read/write. But those patches would have to be radical > simplifications, not adding yet more crud on top of the pain that is > splice(). > > Because it will hurt performance. And I'm ok with that as long as it > comes with huge simplifications. What I'm *not* ok with is "I mis-used > splice, now I want splice to act differently, so let's make it even > more complicated". If we want to go down the simplification route, then the patches I posted might be a good start. The idea I tried to work towards is that the pipe only ever contains private pages in it that only the pipe has a ref on and that no one else can access until they come out the other end again. I got rid of the ->confirm() pipe buf op and would like to kill off all of the others too. I simplified splice() by: - Making sure any candidate pages are uptodate right up front. - Allowing automatic stealing of pages from the pagecache if no one else is using them. This should avoid losing a chunk of the performance that splice is supposed to gain - but if you're serving pages repeatedly in a webserver with this, it's going to be a problem. Possibly this should be contingent on SPLICE_F_MOVE - though the manpage says "*from* the pipe" implying it's only usable on the output side. - Copying in every other circumstance. I simplified vmsplice() by: - If SPLICE_F_GIFT is set, attempting to steal whole pages in the buffer up front if not in use by anyone else. - Copying in every other circumstance. That said, there are still sources that I didn't touch yet that attempt to insert pages into a pipe: relayfs (which does some accounting stuff based on the final consumption of the pages it inserted), sockets (which don't allow inserted pages to be stolen) and notifications (which don't want to allocate at notification time - but I can deal with that). And there's tee() (which would need to copy the data). And pipe-to-pipe splice (which could steal whole pages, but would otherwise have to copy). If you would prefer to go for utter simplification, we could make sendfile() from a buffered file just call sendmsg() directly with MSG_SPLICE_PAGES set and ignore the pipe entirely (I'm tempted to do this anyway) and then make splice() to a pipe just do copy_splice_read() and vmsplice() to a pipe do writev(). I wonder how much splice() is used compared to sendfile(). I would prefer to leave splice() and vmsplice() as they are now and adjust the documentation. As you say, they can be considered a type of zerocopy. David