On Fri, Jul 26, 2024 at 07:13:58PM +0200, Christoph Hellwig wrote: > On Fri, Jul 26, 2024 at 03:29:48PM +0100, John Garry wrote: > > I have been considering another approach to solve this problem. > > > > In this patch - as you know - we zero unwritten parts of a newly allocated > > extent. This is so that when we later issue an atomic write, we would not > > have the problem of unwritten extents and how the iomap iterator will > > create multiple BIOs (which is not permitted). > > > > How about an alternate approach like this: > > - no sub-extent zeroing > > - iomap iter is changed to allocate a single BIO for an atomic write in > > first iteration > > - each iomap extent iteration appends data to that same BIO > > - when finished iterating, we submit the BIO > > > > Obviously that will mean many changes to the iomap bio iterator, but is > > quite self-contained. > > Yes, I also suggested that during the zeroing fix discussion. There > is generally no good reason to start a new direct I/O bio if the > write is contiguous on disk and only the state of the srcmap is different. > This will also be a big win for COW / out of place overwrites. But what happens if the pre-write state is: WUWUWUWU You can write all 8 blocks with a single bio, but the directio write completion has to run four separate transactions to convert the four unwritten mappings. For COW it's ok if we crash midway through the ioend such that a read after recovery sees this: WWWWW0W0 because we've never guaranteed what happens if the system crashes before fsync completes. For untorn writes this is not allowed (even if the actual disk contents landed successfully) because we said we wouldn't tear the write. --D