On 2021/03/08 12:32, Shin'ichiro Kawasaki wrote: > When zone reset ioctl and data read race for a same zone on zoned block > devices, the data read leaves stale page cache even though the zone > reset ioctl zero clears all the zone data on the device. To avoid > non-zero data read from the stale page cache after zone reset, discard > page cache of reset target zones. In same manner as fallocate, call the > function truncate_bdev_range() in blkdev_zone_mgmt_ioctl() before and > after zone reset to ensure the page cache discarded. > > This patch can be applied back to the stable kernel version v5.10.y. > Rework is needed for older stable kernels. > > Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@xxxxxxx> > Fixes: 3ed05a987e0f ("blk-zoned: implement ioctls") > Cc: <stable@xxxxxxxxxxxxxxx> # 5.10+ > --- > block/blk-zoned.c | 30 ++++++++++++++++++++++++++++-- > 1 file changed, 28 insertions(+), 2 deletions(-) > > diff --git a/block/blk-zoned.c b/block/blk-zoned.c > index 833978c02e60..990a36be2927 100644 > --- a/block/blk-zoned.c > +++ b/block/blk-zoned.c > @@ -329,6 +329,9 @@ int blkdev_zone_mgmt_ioctl(struct block_device *bdev, fmode_t mode, > struct request_queue *q; > struct blk_zone_range zrange; > enum req_opf op; > + sector_t capacity; > + loff_t start, end; > + int ret; > > if (!argp) > return -EINVAL; > @@ -349,9 +352,22 @@ int blkdev_zone_mgmt_ioctl(struct block_device *bdev, fmode_t mode, > if (copy_from_user(&zrange, argp, sizeof(struct blk_zone_range))) > return -EFAULT; > > + capacity = get_capacity(bdev->bd_disk); > + if (zrange.sector + zrange.nr_sectors <= zrange.sector || > + zrange.sector + zrange.nr_sectors > capacity) > + /* Out of range */ > + return -EINVAL; > + > + start = zrange.sector << SECTOR_SHIFT; > + end = ((zrange.sector + zrange.nr_sectors) << SECTOR_SHIFT) - 1; Move these under the BLKRESETZONE case as Kanchan suggested. > + > switch (cmd) { > case BLKRESETZONE: > op = REQ_OP_ZONE_RESET; > + /* Invalidate the page cache, including dirty pages. */ > + ret = truncate_bdev_range(bdev, mode, start, end); > + if (ret) > + return ret; > break; > case BLKOPENZONE: > op = REQ_OP_ZONE_OPEN; > @@ -366,8 +382,18 @@ int blkdev_zone_mgmt_ioctl(struct block_device *bdev, fmode_t mode, > return -ENOTTY; > } > > - return blkdev_zone_mgmt(bdev, op, zrange.sector, zrange.nr_sectors, > - GFP_KERNEL); > + ret = blkdev_zone_mgmt(bdev, op, zrange.sector, zrange.nr_sectors, > + GFP_KERNEL); > + > + /* > + * Invalidate the page cache again for zone reset; if someone wandered > + * in and dirtied a page, we just discard it - userspace has no way of > + * knowing whether the write happened before or after reset completing. I think you can simplify this comment: writes can only be direct for zoned devices so concurrent writes would not add any page to the page cache after/during reset. The page cache may be filled again due to concurrent reads though and dropping the pages for these is fine. > + */ > + if (!ret && cmd == BLKRESETZONE) > + ret = truncate_bdev_range(bdev, mode, start, end); > + > + return ret; > } > > static inline unsigned long *blk_alloc_zone_bitmap(int node, > With these fixed, looks good to me. Reviewed-by: Damien Le Moal <damien.lemoal@xxxxxxx> -- Damien Le Moal Western Digital Research