On Tue, May 03, 2022 at 07:39:58AM +0100, Matthew Wilcox (Oracle) wrote: > This is very much in development and basically untested, but Damian > started describing to me something that he wanted, and I told him he > was asking for the wrong thing, and I already had this patch series > in progress. If someone wants to pick it up and make it mergable, > that'd be grand. That've very non-descriptive. Saying "someone wanted something, I said it's wrong, so here's a patch series about something else" doesn't tell me anything about the problem that Damien was trying to solve. > The idea is that an O_SYNC write is always going to want to write, and > we know that at the time we're storing into the page cache. So for an > otherwise clean folio, we can skip the part where we dirty the folio, > find the dirty folios and wait for their writeback. What exactly is this shortcut trying to optimise away? A bit of CPU time? O_SYNC is already a write-through operation - we just call filemap_write_and_wait_range() once we've copied the data into the page cache and dirtied the page. What does skipping the dirty page step gain us? > We can just mark the > folio as writeback-in-progress and start the IO there and then (where we > know exactly which blocks need to be written, so possibly a smaller I/O > than writing the entire page). The existing "find dirty pages, start > I/O and wait on them" code will end up waiting on this pre-started I/O > to complete, even though it didn't start any of its own I/O. > > The important part is patch 9. Everything before it is boring prep work. > I'm in two minds about whether to keep the 'write_through' bool, or > remove it. So feel to read patches 9+10 squashed together, or as if > patch 10 doesn't exist. Whichever feels better. > > The biggest problem with all this is that iomap doesn't have the necessary > information to cause extent allocation, so if you do an O_SYNC write > to an extent which is HOLE or DELALLOC, we can't do this optimisation. > Maybe that doesn't really matter for interesting applications. I suspect > it doesn't matter for ZoneFS. This seems like a lot of complexity for only partial support. It introduces races with page dirtying and cleaning, it likely has interesting issues with all the VM dirty/writeback accounting (because this series is using a completion path that expects the submission path has done it's side of the accounting) and it only works in certain preconditions are met. And I still don't know what problem this code actually trying to solve.... Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx