Re: [PATCH] iomap: fix iomap_dio_zero() for fs bs > system page size

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Oct 26, 2023 at 04:08:32PM +0200, Pankaj Raghav wrote:
> From: Pankaj Raghav <p.raghav@xxxxxxxxxxx>
> 
> iomap_dio_zero() will pad a fs block with zeroes if the direct IO size
> < fs block size. iomap_dio_zero() has an implicit assumption that fs block
> size < page_size. This is true for most filesystems at the moment.
> 
> If the block size > page size (Large block sizes)[1], this will send the
> contents of the page next to zero page(as len > PAGE_SIZE) to the
> underlying block device, causing FS corruption.
> 
> iomap is a generic infrastructure and it should not make any assumptions
> about the fs block size and the page size of the system.
> 
> Fixes: db074436f421 ("iomap: move the direct IO code into a separate file")
> Signed-off-by: Pankaj Raghav <p.raghav@xxxxxxxxxxx>
> 
> [1] https://lore.kernel.org/lkml/20230915183848.1018717-1-kernel@xxxxxxxxxxxxxxxx/
> ---
> I had initially planned on sending this as a part of LBS patches but I                                                                                                                                                                                                                                                  
> think this can go as a standalone patch as it is a generic fix to iomap.                                                                                                                                                                                                                                                
>                                                                                                                                                                                                                                                                                                                         
> @Dave chinner This fixes the corruption issue you were seeing in                                                                                                                                                                                                                                                        
> generic/091 for bs=64k in [2]                                                                                                                                                                                                                                                                                           
>                                                                                                                                                                                                                                                                                                                         
> [2] https://lore.kernel.org/lkml/ZQfbHloBUpDh+zCg@xxxxxxxxxxxxxxxxxxx/
> 
>  fs/iomap/direct-io.c | 13 +++++++++++--
>  1 file changed, 11 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c
> index bcd3f8cf5ea4..04f6c5548136 100644
> --- a/fs/iomap/direct-io.c
> +++ b/fs/iomap/direct-io.c
> @@ -239,14 +239,23 @@ static void iomap_dio_zero(const struct iomap_iter *iter, struct iomap_dio *dio,
>  	struct page *page = ZERO_PAGE(0);
>  	struct bio *bio;
>  
> -	bio = iomap_dio_alloc_bio(iter, dio, 1, REQ_OP_WRITE | REQ_SYNC | REQ_IDLE);
> +	WARN_ON_ONCE(len > (BIO_MAX_VECS * PAGE_SIZE));

How can that happen here? Max fsb size will be 64kB for the
foreseeable future, the bio can hold 256 pages so it will have a
minimum size capability of 1MB.

FWIW, as a general observation, I think this is the wrong place to
be checking that a filesystem block is larger than can be fit in a
single bio. There's going to be problems all over the place if we
can't do fsb sized IO in a single bio. i.e. I think this sort of
validation should be performed during filesystem mount, not
sporadically checked with WARN_ON() checks in random places in the
IO path...

> +
> +	bio = iomap_dio_alloc_bio(iter, dio, BIO_MAX_VECS,
> +				  REQ_OP_WRITE | REQ_SYNC | REQ_IDLE);
>  	fscrypt_set_bio_crypt_ctx(bio, inode, pos >> inode->i_blkbits,
>  				  GFP_KERNEL);
> +
>  	bio->bi_iter.bi_sector = iomap_sector(&iter->iomap, pos);
>  	bio->bi_private = dio;
>  	bio->bi_end_io = iomap_dio_bio_end_io;
>  
> -	__bio_add_page(bio, page, len, 0);
> +	while (len) {
> +		unsigned int io_len = min_t(unsigned int, len, PAGE_SIZE);
> +
> +		__bio_add_page(bio, page, io_len, 0);
> +		len -= io_len;
> +	}
>  	iomap_dio_submit_bio(iter, dio, bio, pos);

/me wonders if we should have a set of ZERO_FOLIO()s that contain a
folio of each possible size. Then we just pull the ZERO_FOLIO of the
correct size and use __bio_add_folio(). i.e. no need for
looping over the bio to repeatedly add the ZERO_PAGE, etc, and the
code is then identical for all cases of page size vs FSB size.

-Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx



[Index of Archives]     [XFS Filesystem Development (older mail)]     [Linux Filesystem Development]     [Linux Audio Users]     [Yosemite Trails]     [Linux Kernel]     [Linux RAID]     [Linux SCSI]


  Powered by Linux