On Thu, Oct 26, 2023 at 04:08:32PM +0200, Pankaj Raghav wrote: > From: Pankaj Raghav <p.raghav@xxxxxxxxxxx> > > iomap_dio_zero() will pad a fs block with zeroes if the direct IO size > < fs block size. iomap_dio_zero() has an implicit assumption that fs block > size < page_size. This is true for most filesystems at the moment. > > If the block size > page size (Large block sizes)[1], this will send the > contents of the page next to zero page(as len > PAGE_SIZE) to the > underlying block device, causing FS corruption. > > iomap is a generic infrastructure and it should not make any assumptions > about the fs block size and the page size of the system. > > Fixes: db074436f421 ("iomap: move the direct IO code into a separate file") > Signed-off-by: Pankaj Raghav <p.raghav@xxxxxxxxxxx> > > [1] https://lore.kernel.org/lkml/20230915183848.1018717-1-kernel@xxxxxxxxxxxxxxxx/ > --- > I had initially planned on sending this as a part of LBS patches but I > think this can go as a standalone patch as it is a generic fix to iomap. > > @Dave chinner This fixes the corruption issue you were seeing in > generic/091 for bs=64k in [2] > > [2] https://lore.kernel.org/lkml/ZQfbHloBUpDh+zCg@xxxxxxxxxxxxxxxxxxx/ > > fs/iomap/direct-io.c | 13 +++++++++++-- > 1 file changed, 11 insertions(+), 2 deletions(-) > > diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c > index bcd3f8cf5ea4..04f6c5548136 100644 > --- a/fs/iomap/direct-io.c > +++ b/fs/iomap/direct-io.c > @@ -239,14 +239,23 @@ static void iomap_dio_zero(const struct iomap_iter *iter, struct iomap_dio *dio, > struct page *page = ZERO_PAGE(0); > struct bio *bio; > > - bio = iomap_dio_alloc_bio(iter, dio, 1, REQ_OP_WRITE | REQ_SYNC | REQ_IDLE); > + WARN_ON_ONCE(len > (BIO_MAX_VECS * PAGE_SIZE)); How can that happen here? Max fsb size will be 64kB for the foreseeable future, the bio can hold 256 pages so it will have a minimum size capability of 1MB. FWIW, as a general observation, I think this is the wrong place to be checking that a filesystem block is larger than can be fit in a single bio. There's going to be problems all over the place if we can't do fsb sized IO in a single bio. i.e. I think this sort of validation should be performed during filesystem mount, not sporadically checked with WARN_ON() checks in random places in the IO path... > + > + bio = iomap_dio_alloc_bio(iter, dio, BIO_MAX_VECS, > + REQ_OP_WRITE | REQ_SYNC | REQ_IDLE); > fscrypt_set_bio_crypt_ctx(bio, inode, pos >> inode->i_blkbits, > GFP_KERNEL); > + > bio->bi_iter.bi_sector = iomap_sector(&iter->iomap, pos); > bio->bi_private = dio; > bio->bi_end_io = iomap_dio_bio_end_io; > > - __bio_add_page(bio, page, len, 0); > + while (len) { > + unsigned int io_len = min_t(unsigned int, len, PAGE_SIZE); > + > + __bio_add_page(bio, page, io_len, 0); > + len -= io_len; > + } > iomap_dio_submit_bio(iter, dio, bio, pos); /me wonders if we should have a set of ZERO_FOLIO()s that contain a folio of each possible size. Then we just pull the ZERO_FOLIO of the correct size and use __bio_add_folio(). i.e. no need for looping over the bio to repeatedly add the ZERO_PAGE, etc, and the code is then identical for all cases of page size vs FSB size. -Dave. -- Dave Chinner david@xxxxxxxxxxxxx