Dave Chinner <david@xxxxxxxxxxxxx> wrote: > On Wed, Jun 28, 2023 at 07:30:50AM +0100, David Howells wrote: > > Matthew Wilcox <willy@xxxxxxxxxxxxx> wrote: > > > > > Expected behavior: > > > > > Punching holes in a file after splicing pages out of that file into > > > > > a pipe should not corrupt the spliced-out pages in the pipe buffer. > > > > I think this bit is the key. Why would this be the expected behaviour? > > As you say, splice is allowed to stuff parts of the pagecache into a pipe > > and these may get transferred, say, to a network card at the end to > > transmit directly from. It's a form of direct I/O. Actually, it's a form of zerocopy, not direct I/O. > > If someone has the pages mmapped, they can change the data that will be > > transmitted; if someone does a write(), they can change that data too. > > The point of splice is to avoid the copy - but it comes with a tradeoff. > > I wouldn't call "post-splice filesystem modifications randomly > corrupts pipe data" a tradeoff. I call that a bug. Would you consider it a kernel bug, then, if you use sendmsg(MSG_ZEROCOPY) to send some data from a file mmapping that some other userspace then corrupts by altering the file before the kernel has managed to send it? Anyway, if you think the splice thing is a bug, we have to fix splice from a buffered file that is shared-writably mmapped as well as fixing fallocate()-driven mangling. There are a number of options: (0) Document the bug as a feature: "If this is a problem, don't use splice". (1) Always copy the data into the pipe. (2) Always unmap and steal the pages from the pagecache, copying if we can't. (3) R/O-protect any PTEs mapping those pages and implement CoW. (4) Disallow splice() from any region that's mmapped, disallow mmap() on or make page_mkwrite wait for any region that's currently spliced. Disallow fallocate() on or make fallocate() wait for any pages that are spliced. With recent changes, I think there are only two places that need fixing: filemap_splice_read() and shmem_splice_read(). However, I wonder what performance effect of having to do a PTE hunt in splice() will be. And then there's vmsplice()... Also, I do wonder what happens if you do MSG_ZEROCOPY to a loopback network address and then splice out of the other end. I'm guessing you'll get the zerocopied pages out into your pipe as I think it just moves the sent skbuffs to the receive queue on the other end. David