Discard and zeroout code has been significantly rewritten recently and as a part of the rewrite we got rid o f the discard_zeroes_data flag. With commit 48920ff2a5a9 ("block: remove the discard_zeroes_data flag") discard_zeroes_data sysfs file and discard_zeroes_data ioctl now always returns zero, regardless of what the device actually supports. This has broken userspace utilities in a way that they will not take advantage of this functionality even if the device actually supports it. Now in order for user to figure out whether the device does suppot deterministic read zeroes after discard without actually running fallocate is to check for discard support (discard_max_bytes) and zeroout hw offload (write_zeroes_max_bytes). However we still have discard_zeroes_data sysfs file and BLKDISCARDZEROES ioctl so I do not see any reason why not to do this check in kernel and provide convenient and compatible way to continue to export this information to use space. With this patch both BLKDISCARDZEROES ioctl and discard_zeroes_data will return 1 in the case that discard and hw offload for write zeroes is supported. Otherwise it will return 0. Signed-off-by: Lukas Czerner <lczerner@xxxxxxxxxx> --- Documentation/ABI/testing/sysfs-block | 11 +++++++++-- Documentation/block/queue-sysfs.txt | 5 +++++ block/blk-sysfs.c | 5 ++++- block/ioctl.c | 6 +++++- 4 files changed, 23 insertions(+), 4 deletions(-) diff --git a/Documentation/ABI/testing/sysfs-block b/Documentation/ABI/testing/sysfs-block index dea212d..6ea0d03 100644 --- a/Documentation/ABI/testing/sysfs-block +++ b/Documentation/ABI/testing/sysfs-block @@ -213,8 +213,15 @@ What: /sys/block/<disk>/queue/discard_zeroes_data Date: May 2011 Contact: Martin K. Petersen <martin.petersen@xxxxxxxxxx> Description: - Will always return 0. Don't rely on any specific behavior - for discards, and don't read this file. + Devices that support discard functionality may return + stale or random data when a previously discarded block + is read back. This can cause problems if the filesystem + expects discarded blocks to be explicitly cleared. If a + device reports that it deterministically returns zeroes + when a discarded area is read the discard_zeroes_data + parameter will be set to one. Otherwise it will be 0 and + the result of reading a discarded area is undefined. + What: /sys/block/<disk>/queue/write_same_max_bytes Date: January 2012 diff --git a/Documentation/block/queue-sysfs.txt b/Documentation/block/queue-sysfs.txt index 2c1e670..b7f6bdc 100644 --- a/Documentation/block/queue-sysfs.txt +++ b/Documentation/block/queue-sysfs.txt @@ -43,6 +43,11 @@ large discards are issued, setting this value lower will make Linux issue smaller discards and potentially help reduce latencies induced by large discard operations. +discard_zeroes_data (RO) +------------------------ +When read, this file will show if the discarded block are zeroed by the +device or not. If its value is '1' the blocks are zeroed otherwise not. + hw_sector_size (RO) ------------------- This is the hardware sector size of the device, in bytes. diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c index 27aceab..5b41ad0 100644 --- a/block/blk-sysfs.c +++ b/block/blk-sysfs.c @@ -209,7 +209,10 @@ static ssize_t queue_discard_max_store(struct request_queue *q, static ssize_t queue_discard_zeroes_data_show(struct request_queue *q, char *page) { - return queue_var_show(0, page); + if (blk_queue_discard(q) && q->limits.max_write_zeroes_sectors) + return queue_var_show(1, page); + else + return queue_var_show(0, page); } static ssize_t queue_write_same_max_show(struct request_queue *q, char *page) diff --git a/block/ioctl.c b/block/ioctl.c index 0de02ee..faecd44 100644 --- a/block/ioctl.c +++ b/block/ioctl.c @@ -508,6 +508,7 @@ int blkdev_ioctl(struct block_device *bdev, fmode_t mode, unsigned cmd, void __user *argp = (void __user *)arg; loff_t size; unsigned int max_sectors; + struct request_queue *q = bdev_get_queue(bdev); switch (cmd) { case BLKFLSBUF: @@ -547,7 +548,10 @@ int blkdev_ioctl(struct block_device *bdev, fmode_t mode, unsigned cmd, case BLKALIGNOFF: return put_int(arg, bdev_alignment_offset(bdev)); case BLKDISCARDZEROES: - return put_uint(arg, 0); + if (blk_queue_discard(q) && q->limits.max_write_zeroes_sectors) + return put_uint(arg, 1); + else + return put_uint(arg, 0); case BLKSECTGET: max_sectors = min_t(unsigned int, USHRT_MAX, queue_max_sectors(bdev_get_queue(bdev))); -- 2.7.5