Re: [PATCH v2 1/2] block: Read max write zeroes once for __blkdev_issue_write_zeroes()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Aug 15, 2024 / 16:32, John Garry wrote:
> As reported in [0], we may get a hang when formatting a XFS FS on a RAID0
> drive.
> 
> Commit 73a768d5f955 ("block: factor out a blk_write_zeroes_limit helper")
> changed __blkdev_issue_write_zeroes() to read the max write zeroes
> value in the loop. This is not safe as max write zeroes may change in
> value. Specifically for the case of [0], the value goes to 0, and we get
> an infinite loop.
> 
> Lift the limit reading out of the loop.
> 
> [0] https://lore.kernel.org/linux-xfs/4d31268f-310b-4220-88a2-e191c3932a82@xxxxxxxxxx/T/#t
> 
> Fixes: 73a768d5f955 ("block: factor out a blk_write_zeroes_limit helper")
> Reviewed-by: Christoph Hellwig <hch@xxxxxx>
> Signed-off-by: John Garry <john.g.garry@xxxxxxxxxx>

Hello John,

I ran may test set for the kernel v6.11-rc5 and found that this commit triggers
an error. When I set up xfs with a dm-zoned device on a TCMU zoned device and
filled the filesystem, the kernel reported this error message:

  device-mapper: zoned reclaim: (sdg): Align zone 19 wp 0 to 32736 (wp+32736) blocks failed -121

When dm-zoned calls blkdev_issue_zeorout(), EREMOTEIO 121 was returned, then
the error was reported. I think a change in this commit is the cause. Please
find my comment below.

> ---
>  block/blk-lib.c | 25 ++++++++++++++++++-------
>  1 file changed, 18 insertions(+), 7 deletions(-)
> 
> diff --git a/block/blk-lib.c b/block/blk-lib.c
> index 9f735efa6c94..83eb7761c2bf 100644
> --- a/block/blk-lib.c
> +++ b/block/blk-lib.c
> @@ -111,13 +111,20 @@ static sector_t bio_write_zeroes_limit(struct block_device *bdev)
>  		(UINT_MAX >> SECTOR_SHIFT) & ~bs_mask);
>  }
>  
> +/*
> + * There is no reliable way for the SCSI subsystem to determine whether a
> + * device supports a WRITE SAME operation without actually performing a write
> + * to media. As a result, write_zeroes is enabled by default and will be
> + * disabled if a zeroing operation subsequently fails. This means that this
> + * queue limit is likely to change at runtime.
> + */
>  static void __blkdev_issue_write_zeroes(struct block_device *bdev,
>  		sector_t sector, sector_t nr_sects, gfp_t gfp_mask,
> -		struct bio **biop, unsigned flags)
> +		struct bio **biop, unsigned flags, sector_t limit)
>  {
> +
>  	while (nr_sects) {
> -		unsigned int len = min_t(sector_t, nr_sects,
> -				bio_write_zeroes_limit(bdev));
> +		unsigned int len = min(nr_sects, limit);
>  		struct bio *bio;
>  
>  		if ((flags & BLKDEV_ZERO_KILLABLE) &&
> @@ -141,12 +148,14 @@ static void __blkdev_issue_write_zeroes(struct block_device *bdev,
>  static int blkdev_issue_write_zeroes(struct block_device *bdev, sector_t sector,
>  		sector_t nr_sects, gfp_t gfp, unsigned flags)
>  {
> +	sector_t limit = bio_write_zeroes_limit(bdev);
>  	struct bio *bio = NULL;
>  	struct blk_plug plug;
>  	int ret = 0;
>  
>  	blk_start_plug(&plug);
> -	__blkdev_issue_write_zeroes(bdev, sector, nr_sects, gfp, &bio, flags);
> +	__blkdev_issue_write_zeroes(bdev, sector, nr_sects, gfp, &bio,
> +			flags, limit);
>  	if (bio) {
>  		if ((flags & BLKDEV_ZERO_KILLABLE) &&
>  		    fatal_signal_pending(current)) {
> @@ -165,7 +174,7 @@ static int blkdev_issue_write_zeroes(struct block_device *bdev, sector_t sector,
>  	 * on an I/O error, in which case we'll turn any error into
>  	 * "not supported" here.
>  	 */
> -	if (ret && !bdev_write_zeroes_sectors(bdev))
> +	if (ret && !limit)
>  		return -EOPNOTSUPP;
>  	return ret;
>  }

The hunk above changed the timing to check WRITE ZEROES capability. Before the
change, bdev_write_zeroes_sectors(bdev) was called after WRITE ZEROES bio
completion. Then it was able to detect the q->limits.max_write_zeroes_sector
value change by drivers at WRITE ZEROES failure (scsi sd driver does it).
However, after applying the hunk, the check is done for the local variable
'limit' which is cached before the WRITE ZEROES bio issue. So it can not detect
the q->limits.max_write_zeroes_sector value change at the WRITE ZEROES failure.
Hence, it does not replace the ERMOTEIO with EOPNOTSUPP.

I think the hunk above should be reverted. I created a patch below for it, and
confirmed it avoids the error. Now I'm running my test set to confirm no
regression. If you have comments on my findings and the patch, please share.

diff --git a/block/blk-lib.c b/block/blk-lib.c
index 83eb7761c2bf..b336d57673d3 100644
--- a/block/blk-lib.c
+++ b/block/blk-lib.c
@@ -148,14 +148,13 @@ static void __blkdev_issue_write_zeroes(struct block_device *bdev,
 static int blkdev_issue_write_zeroes(struct block_device *bdev, sector_t sector,
 		sector_t nr_sects, gfp_t gfp, unsigned flags)
 {
-	sector_t limit = bio_write_zeroes_limit(bdev);
 	struct bio *bio = NULL;
 	struct blk_plug plug;
 	int ret = 0;
 
 	blk_start_plug(&plug);
 	__blkdev_issue_write_zeroes(bdev, sector, nr_sects, gfp, &bio,
-			flags, limit);
+				    flags, bio_write_zeroes_limit(bdev));
 	if (bio) {
 		if ((flags & BLKDEV_ZERO_KILLABLE) &&
 		    fatal_signal_pending(current)) {
@@ -174,7 +173,7 @@ static int blkdev_issue_write_zeroes(struct block_device *bdev, sector_t sector,
 	 * on an I/O error, in which case we'll turn any error into
 	 * "not supported" here.
 	 */
-	if (ret && !limit)
+	if (ret && !bdev_write_zeroes_sectors(bdev))
 		return -EOPNOTSUPP;
 	return ret;
 }




[Index of Archives]     [Linux RAID]     [Linux SCSI]     [Linux ATA RAID]     [IDE]     [Linux Wireless]     [Linux Kernel]     [ATH6KL]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Device Mapper]

  Powered by Linux