From: Zhang Yi <yi.zhang@xxxxxxxxxx> Currently, disks primarily implement the write zeroes command (aka REQ_OP_WRITE_ZEROES) through two mechanisms: the first involves physically writing zeros to the disk media (e.g., HDDs), while the second performs an unmap operation on the logical blocks, effectively putting them into a deallocated state (e.g., SSDs). The first method is generally slow, while the second method is typically very fast. For example, on certain NVMe SSDs that support NVME_NS_DEAC, submitting REQ_OP_WRITE_ZEROES requests with the NVME_WZ_DEAC bit can accelerate the write zeros operation by placing disk blocks into a deallocated state. However, it is difficult to ascertain whether the storage device supports unmap write zeroes. We cannot determine this solely by querying bdev_limits(bdev)->max_write_zeroes_sectors. Therefore, add a new queue limit feature, BLK_FEAT_WRITE_ZEROES_UNMAP and the corresponding sysfs entry, to indicate whether the block device explicitly supports the unmapped write zeroes command. Each device driver should set this bit if it is certain that the attached disk supports this command. If the bit is not set, the disk either does not support it, or its support status is unknown. For the stacked devices cases, the BLK_FEAT_WRITE_ZEROES_UNMAP should be supported both by the stacking driver and all underlying devices. Signed-off-by: Zhang Yi <yi.zhang@xxxxxxxxxx> Reviewed-by: Christoph Hellwig <hch@xxxxxx> --- Documentation/ABI/stable/sysfs-block | 14 ++++++++++++++ block/blk-settings.c | 6 ++++++ block/blk-sysfs.c | 3 +++ include/linux/blkdev.h | 3 +++ 4 files changed, 26 insertions(+) diff --git a/Documentation/ABI/stable/sysfs-block b/Documentation/ABI/stable/sysfs-block index 890cde28bf90..67513c0d9233 100644 --- a/Documentation/ABI/stable/sysfs-block +++ b/Documentation/ABI/stable/sysfs-block @@ -742,6 +742,20 @@ Description: 0, write zeroes is not supported by the device. +What: /sys/block/<disk>/queue/write_zeroes_unmap +Date: January 2025 +Contact: Zhang Yi <yi.zhang@xxxxxxxxxx> +Description: + [RO] Devices that explicitly support the unmap write zeroes + operation in which a single write zeroes request with the unmap + bit set to zero out the range of contiguous blocks on storage + by freeing blocks, rather than writing physical zeroes to the + media. If write_zeroes_unmap is 1, this indicates that the + device explicitly supports the write zero command. Otherwise, + the device either does not support it, or its support status is + unknown. + + What: /sys/block/<disk>/queue/zone_append_max_bytes Date: May 2020 Contact: linux-block@xxxxxxxxxxxxxxx diff --git a/block/blk-settings.c b/block/blk-settings.c index 6b2dbe645d23..3331d07bd5d9 100644 --- a/block/blk-settings.c +++ b/block/blk-settings.c @@ -697,6 +697,8 @@ int blk_stack_limits(struct queue_limits *t, struct queue_limits *b, t->features &= ~BLK_FEAT_NOWAIT; if (!(b->features & BLK_FEAT_POLL)) t->features &= ~BLK_FEAT_POLL; + if (!(b->features & BLK_FEAT_WRITE_ZEROES_UNMAP)) + t->features &= ~BLK_FEAT_WRITE_ZEROES_UNMAP; t->flags |= (b->flags & BLK_FLAG_MISALIGNED); @@ -819,6 +821,10 @@ int blk_stack_limits(struct queue_limits *t, struct queue_limits *b, t->zone_write_granularity = 0; t->max_zone_append_sectors = 0; } + + if (!t->max_write_zeroes_sectors) + t->features &= ~BLK_FEAT_WRITE_ZEROES_UNMAP; + blk_stack_atomic_writes_limits(t, b, start); return ret; diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c index d584461a1d84..6f00e9a8f8b6 100644 --- a/block/blk-sysfs.c +++ b/block/blk-sysfs.c @@ -261,6 +261,7 @@ static ssize_t queue_##_name##_show(struct gendisk *disk, char *page) \ QUEUE_SYSFS_FEATURE_SHOW(fua, BLK_FEAT_FUA); QUEUE_SYSFS_FEATURE_SHOW(dax, BLK_FEAT_DAX); +QUEUE_SYSFS_FEATURE_SHOW(write_zeroes_unmap, BLK_FEAT_WRITE_ZEROES_UNMAP); static ssize_t queue_poll_show(struct gendisk *disk, char *page) { @@ -510,6 +511,7 @@ QUEUE_LIM_RO_ENTRY(queue_atomic_write_unit_min, "atomic_write_unit_min_bytes"); QUEUE_RO_ENTRY(queue_write_same_max, "write_same_max_bytes"); QUEUE_LIM_RO_ENTRY(queue_max_write_zeroes_sectors, "write_zeroes_max_bytes"); +QUEUE_LIM_RO_ENTRY(queue_write_zeroes_unmap, "write_zeroes_unmap"); QUEUE_LIM_RO_ENTRY(queue_max_zone_append_sectors, "zone_append_max_bytes"); QUEUE_LIM_RO_ENTRY(queue_zone_write_granularity, "zone_write_granularity"); @@ -656,6 +658,7 @@ static struct attribute *queue_attrs[] = { &queue_atomic_write_unit_min_entry.attr, &queue_atomic_write_unit_max_entry.attr, &queue_max_write_zeroes_sectors_entry.attr, + &queue_write_zeroes_unmap_entry.attr, &queue_max_zone_append_sectors_entry.attr, &queue_zone_write_granularity_entry.attr, &queue_rotational_entry.attr, diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index e39c45bc0a97..5d280c7fba65 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -342,6 +342,9 @@ typedef unsigned int __bitwise blk_features_t; #define BLK_FEAT_ATOMIC_WRITES \ ((__force blk_features_t)(1u << 16)) +/* supports unmap write zeroes command */ +#define BLK_FEAT_WRITE_ZEROES_UNMAP ((__force blk_features_t)(1u << 17)) + /* * Flags automatically inherited when stacking limits. */ -- 2.46.1