On Mon, Nov 04, 2024 at 12:52:42PM +1100, Dave Chinner wrote: > On Thu, Oct 31, 2024 at 02:36:40PM -0700, Darrick J. Wong wrote: > > On Sat, Oct 26, 2024 at 10:05:44AM +0530, Ritesh Harjani wrote: > > > > This gets me to the third and much less general solution -- only allow > > > > untorn writes if we know that the ioend only ever has to run a single > > > > transaction. That's why untorn writes are limited to a single fsblock > > > > for now -- it's a simple solution so that we can get our downstream > > > > customers to kick the tires and start on the next iteration instead of > > > > spending years on waterfalling. > > > > > > > > Did you notice that in all of these cases, the capabilities of the > > > > filesystem's ioend processing determines the restrictions on the number > > > > and type of mappings that ->iomap_begin can give to iomap? > > > > > > > > Now that we have a second system trying to hook up to the iomap support, > > > > it's clear to me that the restrictions on mappings are specific to each > > > > filesystem. Therefore, the iomap directio code should not impose > > > > restrictions on the mappings it receives unless they would prevent the > > > > creation of the single aligned bio. > > > > > > > > Instead, xfs_direct_write_iomap_begin and ext4_iomap_begin should return > > > > EINVAL or something if they look at the file mappings and discover that > > > > they cannot perform the ioend without risking torn mapping updates. In > > > > the long run, ->iomap_begin is where this iomap->len <= iter->len check > > > > really belongs, but hold that thought. > > > > > > > > For the multi fsblock case, the ->iomap_begin functions would have to > > > > check that only one metadata update would be necessary in the ioend. > > > > That's where things get murky, since ext4/xfs drop their mapping locks > > > > between calls to ->iomap_begin. So you'd have to check all the mappings > > > > for unsupported mixed state every time. Yuck. > > > > > > > > > > Thanks Darrick for taking time summarizing what all has been done > > > and your thoughts here. > > > > > > > It might be less gross to retain the restriction that iomap accepts only > > > > one mapping for the entire file range, like Ritesh has here. > > > > > > less gross :) sure. > > > > > > I would like to think of this as, being less restrictive (compared to > > > only allowing a single fsblock) by adding a constraint on the atomic > > > write I/O request i.e. > > > > > > "Atomic write I/O request to a region in a file is only allowed if that > > > region has no partially allocated extents. Otherwise, the file system > > > can fail the I/O operation by returning -EINVAL." > > > > > > Essentially by adding this constraint to the I/O request, we are > > > helping the user to prevent atomic writes from accidentally getting > > > torned and also allowing multi-fsblock writes. So I still think that > > > might be the right thing to do here or at least a better start. FS can > > > later work on adding such support where we don't even need above > > > such constraint on a given atomic write I/O request. > > > > On today's ext4 call, Ted and Ritesh and I realized that there's a bit > > more to it than this -- it's not possible to support untorn writes to a > > mix of written/(cow,unwritten) mappings even if they all point to the > > same physical space. If the system fails after the storage device > > commits the write but before any of the ioend processing is scheduled, a > > subsequent read of the previously written blocks will produce the new > > data, but reads to the other areas will produce the old contents (or > > zeroes, or whatever). That's a torn write. > > I'm *really* surprised that people are only realising that IO > completion processing for atomic writes *must be atomic*. I've been saying for a while; it's just that I didn't realize until last week that there were more rules than "can't do an untorn write if you need to do more than 1 mapping update": Untorn writes are not possible if: 1. More than 1 mapping update is needed 2. 1 mapping update is needed but there's a written block (1) can be worked around with a log intent item for ioend processing, but I don't think (2) can at all. That's why I went back to saying that untorn writes require that there can only be one mapping. > This was the foundational constraint that the forced alignment > proposal for XFS was intended to address. i.e. to prevent fs > operations from violating atomic write IO constraints (e.g. punching > sub-atomic write size holes in the file) so that the physical IO can > be done without tearing and the IO completion processing that > exposes the new data can be done atomically. > > > Therefore, iomap ought to stick to requiring that ->iomap_begin returns > > a single iomap to cover the entire file range for the untorn write. For > > an unwritten extent, the post-recovery read will see either zeroes or > > the new contents; for a single-mapping COW it'll see old or new contents > > but not both. > > I'm pretty sure we enforced that in the XFS mapping implemention for > atomic writes using forced alignment. i.e. we have to return a > correctly aligned, contiguous mapping to iomap or we have to return > -EINVAL to indicate atomic write mapping failed. I think you guys did, it's just the ext4 bigalloc thing that started this back up again. :/ > Yes, we can check this in iomap, but it's really the filesystem that > has to implement and enforce it... I think we ought to keep the "1 mapping per untorn write" check in iomap until someone decides that it's worth the trouble to make a filesystem handle mixed states correctly. Mostly as a guard against the implementations. --D > -Dave. > -- > Dave Chinner > david@xxxxxxxxxxxxx >