Re: RFC: 32-bit __data_len and REQ_DISCARD+REQ_SECURE

Jeff Moyer <jmoyer@xxxxxxxxxx> · Tue, 20 Oct 2015 14:57:28 -0400

Hi Grant,

Grant Grundler <grundler@xxxxxxxxxxxx> writes:

> Ping? Does no one care how long BLK_SECDISCARD takes?
>
> ChromeOS has landed this change as a compromise between "fast" (<10
> seconds) and "minimize risk" (~90 seconds) for a 23GB partition on
> eMMC:
>     https://chromium-review.googlesource.com/#/c/302413/

Including the patch would be helpful.  I believe this is it.  My
comments are inline.

diff --git a/block/blk-lib.c b/block/blk-lib.c
index 8411be3..43943c7 100644
--- a/block/blk-lib.c
+++ b/block/blk-lib.c

@@ -60,21 +60,37 @@
 	granularity = max(q->limits.discard_granularity >> 9, 1U);
 	alignment = (bdev_discard_alignment(bdev) >> 9) % granularity;
 
-	/*
-	 * Ensure that max_discard_sectors is of the proper
-	 * granularity, so that requests stay aligned after a split.
-	 */
-	max_discard_sectors = min(q->limits.max_discard_sectors, UINT_MAX >> 9);
-	max_discard_sectors -= max_discard_sectors % granularity;
-	if (unlikely(!max_discard_sectors)) {
-		/* Avoid infinite loop below. Being cautious never hurts. */
-		return -EOPNOTSUPP;
-	}
+	max_discard_sectors = min(q->limits.max_discard_sectors,
+						UINT_MAX >> 9);

Unnecessary reformatting.
 
 	if (flags & BLKDEV_DISCARD_SECURE) {
 		if (!blk_queue_secdiscard(q))
 			return -EOPNOTSUPP;
 		type |= REQ_SECURE;
+		/*
+		 * Secure erase performs better by telling the device
+		 * about the largest range possible.  Secure erase
+		 * piecemeal will likely result in mapped sectors
+		 * getting evacuated from one range and parked in
+		 * another range that will get erased by a future
+		 * erase command.  This does NOT happen for normal
+		 * TRIM or DISCARD operations.
+		 *
+		 * 32GB was a compromise to avoid blocking the device
+		 * for potentially minute(s) at a time.
+		 */
+		if (max_discard_sectors < (1 << (25-9)))	/* 32GiB */
+			max_discard_sectors = 1 << (25-9);

And here you're ignoring q->limits.max_discard_sectors.  I'm surprised
this worked!

+	}
+
+	/*
+	 * Ensure that max_discard_sectors is of the proper
+	 * granularity, so that requests stay aligned after a split.
+	 */
+	max_discard_sectors -= max_discard_sectors % granularity;
+	if (unlikely(!max_discard_sectors)) {
+		/* Avoid infinite loop below. Being cautious never hurts. */
+		return -EOPNOTSUPP;
 	}
 
 	atomic_set(&bb.done, 1);

Grant, can we start over with the problem description? (Sorry, I didn't
see the previous posts.)  I'd like to know the values of discard_granularity
and discard_max_bytes for your device.  Additionally, it would be
interesting to know how the discards are being initiatied.  Is it via a
userspace utility such as mkfs, online discard via some file system
mounted with -o discard, or something else?  Finally, can you post
binary blktrace data somewhere for the slow case?

Thanks!
Jeff




> On Mon, Sep 28, 2015 at 2:45 PM, Grant Grundler <grundler@xxxxxxxxxxxx> wrote:
>> [resending...I forgot to switch gmail back to text-only mode. grrrh..]
>>
>> ---------- Forwarded message ----------
>> From: Grant Grundler <grundler@xxxxxxxxxxxx>
>> Date: Mon, Sep 28, 2015 at 2:42 PM
>> Subject: Re: RFC: 32-bit __data_len and REQ_DISCARD+REQ_SECURE
>> To: Grant Grundler <grundler@xxxxxxxxxxxx>
>> Cc: Jens Axboe <axboe@xxxxxxxxx>, Ulf Hansson
>> <ulf.hansson@xxxxxxxxxx>, LKML <linux-kernel@xxxxxxxxxxxxxxx>,
>> "linux-mmc@xxxxxxxxxxxxxxx" <linux-mmc@xxxxxxxxxxxxxxx>
>>
>>
>> On Thu, Sep 24, 2015 at 10:39 AM, Grant Grundler <grundler@xxxxxxxxxxxx> wrote:
>>>
>>> Some followup.
>> ...
>>>
>>> 2) I've been able to test this hack on an eMMC device:
>>> [   13.147747] mmc..._secdiscard_rq(mmc1) ERASE from 14116864 cnt
>>> 0x2c00000 (size 22528 MiB)
>>> [   13.155964] sdhci cmd: 35/0x1a arg 0xd76800
>>> [   13.160266] sdhci cmd: 36/0x1a arg 0x39767ff
>>> [   13.164593] sdhci cmd: 38/0x1b arg 0x80000000
>>> [   13.803360] random: nonblocking pool is initialized
>>> [   14.567735] sdhci cmd: 13/0x1a arg 0x10000
>>> [   14.573324] mmc..._secdiscard_rq(mmc1) err 0
>>>
>>> This was with ~15K files and about 5GB written to the device. 1.4
>>> seconds compared to about 20 minutes to secure erase the same region
>>> with original v3.18 code.
>>
>>
>> To put a few more numbers on the "chunk size vs perf":
>>  1EG (512KB) -> 44K commands -> ~20 minutes
>> 32EG (16MB) -> 1375 commands -> ~1 minute
>> 128EG (64MB) -> 344 commands -> ~30 seconds
>> 8191EG (~4GB) -> 6 commands -> 2 seconds + ~8 seconds mkfs
>> (I'm assuming times above include about 6-10 seconds of mkfs as part
>> of writing a new file system)
>>
>> This is with only ~300MB of data written to the partition. I'm fully
>> aware that times will vary depending on how much data needs to be
>> migrated (and in this case very little or none). I'm certain the
>> difference will only get worse for the smaller the "chunk size" used
>> to Secure Erase due to repeated data migration.
>>
>> Given the different use model for secure erase (legal/contractually
>> required behavior), is using 4GB chunk size acceptable?
>>
>> Would anyone be terribly offended if I used the recently added
>> "MMC_IOC_MULTI_CMD" to send the cmd 35/36/38 sequence to the eMMC
>> device to securely erase the offending partition?
>>
>> thanks,
>> grant
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
--
To unsubscribe from this list: send the line "unsubscribe linux-mmc" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html