On Thu, Feb 9, 2023 at 8:06 PM Dave Chinner <david@xxxxxxxxxxxxx> wrote: >> > So while I was pondering the complexity of this and watching a great > big shiny rocket create lots of heat, light and noise, it occurred > to me that we already have a mechanism for preventing page cache > data from being changed while the folios are under IO: > SB_I_STABLE_WRITES and folio_wait_stable(). No, Dave. Not at all. Stop and think. splice() is not some "while under IO" thing. It's *UNBOUNDED*. Let me just give an example: random user A does fd = open("/dev/passwd", O_RDONLY); splice(fd, NULL, pipefd, NULL, ..); sleep(10000); and you now want to block all writes to the page in that file as long as it's being held on to, do you? So no. The above is also why something like IOMAP_F_SHARED is not relevant. The whole point of splice is to act as a way to communicate pages between *DIFFERENT* subsystems. The only thing they have in common is the buffer (typically a page reference, but it can be other things) that is getting transferred. A spliced page - by definition - is not in some controlled state where one filesystem (or one subsystem like networking) owns it, because the whole and only point of splice is to act as that "take data from one random source and feed it in to another random destination", and avoid the N*M complexity matrix of N sources and M destinations. So no. We cannot block writes, because there is no bounded time for them. And no, we cannot use some kind of "mark this IO as shared", because there is no "this IO". It is also worth noting that the shared behavior (ie "splice acts as a temporary shared buffer") might even be something that people actually expect and depend on for semantics. I hope not, but it's not entirely impossible that people change the source (could be file data for the above "splice from file" case, but could be your own virtual memory image for "vmsplice()") _after_ splicing the source, and before splicing it to the destination. (It sounds very unlikely that people would do that for file contents, but vmsplice() might intentionally use buffers that may be "live"). Now, to be honest, I hope nobody plays games like that. In fact, I'm a bit sorry that I was pushing splice() in the first place. Playing games with zero-copy tends to cause problems, and we've had some nasty security issues in this area too. Now, splice() is a *lot* cleaner conceptually than "sendfile()" ever was, exactly because it allows that whole "many different sources, many different destinations" model. But this is also very much an example of how "generic" may be something that is revered in computer science, but is often a *horrible* thing in reality. Because if this was just "sendfile()", and would be one operation that moves file contents from the page cache to the network buffers, then your idea of "prevent data from being changed while in transit" would actually be valid. Special cases are often much simpler and easier, and sometimes the special cases are all you actually want. Splice() is not a special case. It's sadly a very interesting and very generic model for sharing buffers, and that does cause very real problems. Linus