On Tue, Oct 24, 2023 at 01:35:33PM +0100, John Garry wrote: > On 09/10/2023 22:05, Darrick J. Wong wrote: > > > > If the file range is a sparse hole, the directio setup will allocate > > > > space and create an unwritten mapping before issuing the write bio. The > > > > rest of the process works the same as preallocations and has the same > > > > behaviors. > > > > > > > > If the file range is allocated and was previously written, the write is > > > > issued and that's all that's needed from the fs. After a crash, reads > > > > of the storage device produce the old contents or the new contents. > > > This is exactly what I explained when reviewing the code that > > > rejected RWF_ATOMIC without O_DSYNC on metadata dirty inodes. > > I'm glad we agree. 😄 > > > > John, when you're back from vacation, can we get rid of this language > > and all those checks under _is_dsync() in the iomap patch? > > > > (That code is 100% the result of me handwaving and bellyaching 6 months > > ago when the team was trying to get all the atomic writes bits working > > prior to LSF and I was too burned out to think the xfs part through. > > As a result, I decided that we'd only support strict overwrites for the > > first iteration.) > > So this following additive code in iomap_dio_bio_iter() should be dropped: > > ----8<----- > > --- a/fs/iomap/direct-io.c > +++ b/fs/iomap/direct-io.c > @@ -275,10 +275,11 @@ static inline blk_opf_t iomap_dio_bio_opflags(struct > iomap_dio *dio, > static loff_t iomap_dio_bio_iter(const struct iomap_iter *iter, > struct iomap_dio *dio) > { > > ... > > @@ -292,6 +293,13 @@ static loff_t iomap_dio_bio_iter(const struct > iomap_iter *iter, > !bdev_iter_is_aligned(iomap->bdev, dio->submit.iter)) > return -EINVAL; > > + if (atomic_write && !iocb_is_dsync(dio->iocb)) { > + if (iomap->flags & IOMAP_F_DIRTY) > + return -EIO; > + if (iomap->type != IOMAP_MAPPED) > + return -EIO; > + } > + > > ---->8----- > > ok? Yes. > > > > > > Summarizing: > > > > > > > > An (ATOMIC|SYNC) request provides the strongest guarantees (data > > > > will not be torn, and all file metadata updates are persisted before > > > > the write is returned to userspace. Programs see either the old data or > > > > the new data, even if there's a crash. > > > > > > > > (ATOMIC|DSYNC) is less strong -- data will not be torn, and any file > > > > updates for just that region are persisted before the write is returned. > > > > > > > > (ATOMIC) is the least strong -- data will not be torn. Neither the > > > > filesystem nor the device make guarantees that anything ended up on > > > > stable storage, but if it does, programs see either the old data or the > > > > new data. > > > Yup, that makes sense to me. > > Perhaps this ^^ is what we should be documenting here. > > > > > > Maybe we should rename the whole UAPI s/atomic/untorn/... > > > Perhaps, though "torn writes" is nomenclature that nobody outside > > > storage and filesystem developers really knows about. All I ever > > > hear from userspace developers is "we want atomic/all-or-nothing > > > data writes"... How about O_NOTEARS -> PWF_NOTEARS -> REQ_NOTEARS. <obligatory "There's no crying in baseball" link, etc.> --D > > Fair 'enuf. > > > Thanks, > John