On Thu, Feb 9, 2023 at 10:58 PM Dave Chinner <david@xxxxxxxxxxxxx> wrote: > > On Fri, Feb 10, 2023 at 04:44:41AM +0000, Matthew Wilcox wrote: > > On Fri, Feb 10, 2023 at 03:06:26PM +1100, Dave Chinner wrote: > > > So while I was pondering the complexity of this and watching a great > > > big shiny rocket create lots of heat, light and noise, it occurred > > > > That was kind of fun > > :) > > > > to me that we already have a mechanism for preventing page cache > > > data from being changed while the folios are under IO: > > > SB_I_STABLE_WRITES and folio_wait_stable(). > > > > I thought about bringing that up, but it's not quite right. That works > > great for writeback, but it only works for writeback. We'd need to track > > another per-folio counter ... it'd be like the page pinning kerfuffle, > > only worse. > > Hmmm - I didn't think of that. It needs the counter because the > "stable request" is per folio reference state, not per folio state, > right? And the single flag works for writeback because we can only > have one writeback context in progress at a time? > > Yeah, not sure how to deal with that easily. > > > And for such a rare thing it seems like a poor use of 32 > > bits of per-page state. > > Maybe, but zero copy file data -> network send is a pretty common > workload. Web servers, file servers, remote backup programs, etc. > Having files being written whilst others are reading them is not as > common, but that does happen in a wide variety of shared file server > environments. > > Regardless, I just had a couple of ideas - it they don't work so be > it. > > > Not to mention that you can effectively block > > all writes to a file for an indefinite time by splicing pages to a pipe > > that you then never read from. > > No, I wasn't suggesting that we make pages in transit stable - they > only need to be stable while the network stack requires them to be > stable.... This is exactly where the existing splice API is problematic. You can't splice from a file to a network socket right now. First you splice to a pipe, and now that pipe contains some magical stuff. And it stays there potentially forever. Then you splice it again to a socket. Would this be better if user code could splice straight to a socket? At least in principle, there could be a _limited_ amount of time during which anything needs to wait, and it's fundamentally entirely reasonable if a concurrent write to a file affects data being zero-copied to a socket _during the time after the zero-copy send is requested and before it reports completion_. Frankly, I really don't like having non-immutable data in a pipe. A pipe is supposed to be a thing into which bytes are written and out from which the *same* bytes emerge, at least to the extent that anyone can observe it. Do we really want: $ some_program | tee file21 > file2 to potentially write different data to file1 and file2? --Andy