On Mon, Apr 29, 2024 at 05:47:42PM +0000, John Garry wrote: > Support atomic writes by producing a single BIO with REQ_ATOMIC flag set. > > We rely on the FS to guarantee extent alignment, such that an atomic write > should never straddle two or more extents. The FS should also check for > validity of an atomic write length/alignment. > > Signed-off-by: John Garry <john.g.garry@xxxxxxxxxx> > --- > fs/iomap/direct-io.c | 10 ++++++++++ > 1 file changed, 10 insertions(+) > > diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c > index a3ed7cfa95bc..d7bdeb675068 100644 > --- a/fs/iomap/direct-io.c > +++ b/fs/iomap/direct-io.c > @@ -275,6 +275,7 @@ static inline blk_opf_t iomap_dio_bio_opflags(struct iomap_dio *dio, > static loff_t iomap_dio_bio_iter(const struct iomap_iter *iter, > struct iomap_dio *dio) > { > + bool is_atomic = dio->iocb->ki_flags & IOCB_ATOMIC; > const struct iomap *iomap = &iter->iomap; > struct inode *inode = iter->inode; > unsigned int zeroing_size, pad; > @@ -387,6 +388,9 @@ static loff_t iomap_dio_bio_iter(const struct iomap_iter *iter, > bio->bi_iter.bi_sector = iomap_sector(iomap, pos); > bio->bi_write_hint = inode->i_write_hint; > bio->bi_ioprio = dio->iocb->ki_ioprio; > + if (is_atomic) > + bio->bi_opf |= REQ_ATOMIC; REQ_ATOMIC is only valid for write IO, isn't it? This should be added in iomap_dio_bio_opflags() after it is determined we are doing a write operation. Regardless, it should be added in iomap_dio_bio_opflags(), not here. That also allows us to get rid of the is_atomic variable. > + > bio->bi_private = dio; > bio->bi_end_io = iomap_dio_bio_end_io; > > @@ -403,6 +407,12 @@ static loff_t iomap_dio_bio_iter(const struct iomap_iter *iter, > } > > n = bio->bi_iter.bi_size; > + if (is_atomic && n != orig_count) { > + /* This bio should have covered the complete length */ > + ret = -EINVAL; > + bio_put(bio); > + goto out; > + } What happens now if we've done zeroing IO before this? I suspect we might expose stale data if the partial block zeroing converts the unwritten extent in full... > if (dio->flags & IOMAP_DIO_WRITE) { > task_io_account_write(n); > } else { Ignoring the error handling issues, this code might be better as: if (dio->flags & IOMAP_DIO_WRITE) { if ((opflags & REQ_ATOMIC) && n != orig_count) { /* atomic writes are all or nothing */ ret = -EIO bio_put(bio); goto out; } } so that we are not putting atomic write error checks in the read IO submission path. -Dave. -- Dave Chinner david@xxxxxxxxxxxxx