Re: [PATCH] block: Discard page cache of zone reset target range

Damien Le Moal <Damien.LeMoal@xxxxxxx> · Tue, 9 Mar 2021 11:16:23 +0000

On 2021/03/08 12:32, Shin'ichiro Kawasaki wrote:
> When zone reset ioctl and data read race for a same zone on zoned block
> devices, the data read leaves stale page cache even though the zone
> reset ioctl zero clears all the zone data on the device. To avoid
> non-zero data read from the stale page cache after zone reset, discard
> page cache of reset target zones. In same manner as fallocate, call the
> function truncate_bdev_range() in blkdev_zone_mgmt_ioctl() before and
> after zone reset to ensure the page cache discarded.
> 
> This patch can be applied back to the stable kernel version v5.10.y.
> Rework is needed for older stable kernels.
> 
> Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@xxxxxxx>
> Fixes: 3ed05a987e0f ("blk-zoned: implement ioctls")
> Cc: <stable@xxxxxxxxxxxxxxx> # 5.10+
> ---
>  block/blk-zoned.c | 30 ++++++++++++++++++++++++++++--
>  1 file changed, 28 insertions(+), 2 deletions(-)
> 
> diff --git a/block/blk-zoned.c b/block/blk-zoned.c
> index 833978c02e60..990a36be2927 100644
> --- a/block/blk-zoned.c
> +++ b/block/blk-zoned.c
> @@ -329,6 +329,9 @@ int blkdev_zone_mgmt_ioctl(struct block_device *bdev, fmode_t mode,
>  	struct request_queue *q;
>  	struct blk_zone_range zrange;
>  	enum req_opf op;
> +	sector_t capacity;
> +	loff_t start, end;
> +	int ret;
>  
>  	if (!argp)
>  		return -EINVAL;
> @@ -349,9 +352,22 @@ int blkdev_zone_mgmt_ioctl(struct block_device *bdev, fmode_t mode,
>  	if (copy_from_user(&zrange, argp, sizeof(struct blk_zone_range)))
>  		return -EFAULT;
>  
> +	capacity = get_capacity(bdev->bd_disk);
> +	if (zrange.sector + zrange.nr_sectors <= zrange.sector ||
> +	    zrange.sector + zrange.nr_sectors > capacity)
> +		/* Out of range */
> +		return -EINVAL;
> +
> +	start = zrange.sector << SECTOR_SHIFT;
> +	end = ((zrange.sector + zrange.nr_sectors) << SECTOR_SHIFT) - 1;

Move these under the BLKRESETZONE case as Kanchan suggested.

> +
>  	switch (cmd) {
>  	case BLKRESETZONE:
>  		op = REQ_OP_ZONE_RESET;
> +		/* Invalidate the page cache, including dirty pages. */
> +		ret = truncate_bdev_range(bdev, mode, start, end);
> +		if (ret)
> +			return ret;
>  		break;
>  	case BLKOPENZONE:
>  		op = REQ_OP_ZONE_OPEN;
> @@ -366,8 +382,18 @@ int blkdev_zone_mgmt_ioctl(struct block_device *bdev, fmode_t mode,
>  		return -ENOTTY;
>  	}
>  
> -	return blkdev_zone_mgmt(bdev, op, zrange.sector, zrange.nr_sectors,
> -				GFP_KERNEL);
> +	ret = blkdev_zone_mgmt(bdev, op, zrange.sector, zrange.nr_sectors,
> +			       GFP_KERNEL);
> +
> +	/*
> +	 * Invalidate the page cache again for zone reset; if someone wandered
> +	 * in and dirtied a page, we just discard it - userspace has no way of
> +	 * knowing whether the write happened before or after reset completing.

I think you can simplify this comment: writes can only be direct for zoned
devices so concurrent writes would not add any page to the page cache
after/during reset. The page cache may be filled again due to concurrent reads
though and dropping the pages for these is fine.

> +	 */
> +	if (!ret && cmd == BLKRESETZONE)
> +		ret = truncate_bdev_range(bdev, mode, start, end);
> +
> +	return ret;
>  }
>  
>  static inline unsigned long *blk_alloc_zone_bitmap(int node,
> 

With these fixed, looks good to me.

Reviewed-by: Damien Le Moal <damien.lemoal@xxxxxxx>

-- 
Damien Le Moal
Western Digital Research