[PATCH] block: reintroduce discard_zeroes_data sysfs file and BLKDISCARDZEROES

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Discard and zeroout code has been significantly rewritten recently and
as a part of the rewrite we got rid o f the discard_zeroes_data flag.

With commit 48920ff2a5a9 ("block: remove the discard_zeroes_data flag")
discard_zeroes_data sysfs file and discard_zeroes_data ioctl now always
returns zero, regardless of what the device actually supports. This has
broken userspace utilities in a way that they will not take advantage of
this functionality even if the device actually supports it.

Now in order for user to figure out whether the device does suppot
deterministic read zeroes after discard without actually running
fallocate is to check for discard support (discard_max_bytes) and
zeroout hw offload (write_zeroes_max_bytes).

However we still have discard_zeroes_data sysfs file and
BLKDISCARDZEROES ioctl so I do not see any reason why not to do this
check in kernel and provide convenient and compatible way to continue to
export this information to use space.

With this patch both BLKDISCARDZEROES ioctl and discard_zeroes_data will
return 1 in the case that discard and hw offload for write zeroes is
supported. Otherwise it will return 0.

Signed-off-by: Lukas Czerner <lczerner@xxxxxxxxxx>
---
 Documentation/ABI/testing/sysfs-block | 11 +++++++++--
 Documentation/block/queue-sysfs.txt   |  5 +++++
 block/blk-sysfs.c                     |  5 ++++-
 block/ioctl.c                         |  6 +++++-
 4 files changed, 23 insertions(+), 4 deletions(-)

diff --git a/Documentation/ABI/testing/sysfs-block b/Documentation/ABI/testing/sysfs-block
index dea212d..6ea0d03 100644
--- a/Documentation/ABI/testing/sysfs-block
+++ b/Documentation/ABI/testing/sysfs-block
@@ -213,8 +213,15 @@ What:		/sys/block/<disk>/queue/discard_zeroes_data
 Date:		May 2011
 Contact:	Martin K. Petersen <martin.petersen@xxxxxxxxxx>
 Description:
-		Will always return 0.  Don't rely on any specific behavior
-		for discards, and don't read this file.
+		Devices that support discard functionality may return
+		stale or random data when a previously discarded block
+		is read back. This can cause problems if the filesystem
+		expects discarded blocks to be explicitly cleared. If a
+		device reports that it deterministically returns zeroes
+		when a discarded area is read the discard_zeroes_data
+		parameter will be set to one. Otherwise it will be 0 and
+		the result of reading a discarded area is undefined.
+
 
 What:		/sys/block/<disk>/queue/write_same_max_bytes
 Date:		January 2012
diff --git a/Documentation/block/queue-sysfs.txt b/Documentation/block/queue-sysfs.txt
index 2c1e670..b7f6bdc 100644
--- a/Documentation/block/queue-sysfs.txt
+++ b/Documentation/block/queue-sysfs.txt
@@ -43,6 +43,11 @@ large discards are issued, setting this value lower will make Linux issue
 smaller discards and potentially help reduce latencies induced by large
 discard operations.
 
+discard_zeroes_data (RO)
+------------------------
+When read, this file will show if the discarded block are zeroed by the
+device or not. If its value is '1' the blocks are zeroed otherwise not.
+
 hw_sector_size (RO)
 -------------------
 This is the hardware sector size of the device, in bytes.
diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
index 27aceab..5b41ad0 100644
--- a/block/blk-sysfs.c
+++ b/block/blk-sysfs.c
@@ -209,7 +209,10 @@ static ssize_t queue_discard_max_store(struct request_queue *q,
 
 static ssize_t queue_discard_zeroes_data_show(struct request_queue *q, char *page)
 {
-	return queue_var_show(0, page);
+	if (blk_queue_discard(q) && q->limits.max_write_zeroes_sectors)
+		return queue_var_show(1, page);
+	else
+		return queue_var_show(0, page);
 }
 
 static ssize_t queue_write_same_max_show(struct request_queue *q, char *page)
diff --git a/block/ioctl.c b/block/ioctl.c
index 0de02ee..faecd44 100644
--- a/block/ioctl.c
+++ b/block/ioctl.c
@@ -508,6 +508,7 @@ int blkdev_ioctl(struct block_device *bdev, fmode_t mode, unsigned cmd,
 	void __user *argp = (void __user *)arg;
 	loff_t size;
 	unsigned int max_sectors;
+	struct request_queue *q = bdev_get_queue(bdev);
 
 	switch (cmd) {
 	case BLKFLSBUF:
@@ -547,7 +548,10 @@ int blkdev_ioctl(struct block_device *bdev, fmode_t mode, unsigned cmd,
 	case BLKALIGNOFF:
 		return put_int(arg, bdev_alignment_offset(bdev));
 	case BLKDISCARDZEROES:
-		return put_uint(arg, 0);
+		if (blk_queue_discard(q) && q->limits.max_write_zeroes_sectors)
+			return put_uint(arg, 1);
+		else
+			return put_uint(arg, 0);
 	case BLKSECTGET:
 		max_sectors = min_t(unsigned int, USHRT_MAX,
 				    queue_max_sectors(bdev_get_queue(bdev)));
-- 
2.7.5




[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]
  Powered by Linux