Re: [PATCH mdadm v2 0/2] Discard Option for Creating Arrays

"Martin K. Petersen" <martin.petersen@xxxxxxxxxx> · Mon, 12 Sep 2022 13:40:36 -0400

Hi Logan!

> When specified, mdadm will send block discard (aka. trim or
> deallocate) requests to all of the specified block devices. It will
> then read back parts of the device to double check that the disks are
> now all zeros. If they are all zero, the array is in a known state and
> does not need to generate the parity seeing everything is zero and
> correct.

Unfortunately that's a dangerous assertion. The drive is free to ignore
any or all parts of a discard request. And typically the results vary
depending on what else the drive has going on at the moment the request
was executed.  I.e. you could experience completely different results on
the same drive depending on whether it was busy garbage collecting or
doing other I/O when the various portions of a discard request were
processed.

> Another option for this work is to use a write zero request. This can
> be done in linux currently with fallocate and the FALLOC_FL_PUNCH_HOLE
> | FALLOC_FL_KEEP_SIZE flags. This will send optimized write-zero requests
> to the devices, without falling back to regular writes to zero the disk.
> The benefit of this is that the disk will explicitly read back as zeros,
> so a zero check is not necessary. The down side is that not all devices
> implement this in as optimal a way as the discard request does and on
> some of these devices zeroing can take multiple seconds per GB.

REQ_OP_WRITE_ZEROES was explicitly designed for this use case. It will
use discards if it is safe to do so. That is if the device supports
deterministic zeroing; either explicitly through the storage protocol or
through ATA quirks (thanks to the drive being vendor-qualified for RAID
usage).

> Because write-zero requests may be slow and most (but not all) discard
> requests read back as zeros, this work uses only discard requests.

REQ_OP_WRITE_ZEROES will pick the most optimal way to guarantee that all
blocks in the requested range will return zeroes for subsequent reads.

-- 
Martin K. Petersen	Oracle Linux Engineering