Hi Logan! > When specified, mdadm will send block discard (aka. trim or > deallocate) requests to all of the specified block devices. It will > then read back parts of the device to double check that the disks are > now all zeros. If they are all zero, the array is in a known state and > does not need to generate the parity seeing everything is zero and > correct. Unfortunately that's a dangerous assertion. The drive is free to ignore any or all parts of a discard request. And typically the results vary depending on what else the drive has going on at the moment the request was executed. I.e. you could experience completely different results on the same drive depending on whether it was busy garbage collecting or doing other I/O when the various portions of a discard request were processed. > Another option for this work is to use a write zero request. This can > be done in linux currently with fallocate and the FALLOC_FL_PUNCH_HOLE > | FALLOC_FL_KEEP_SIZE flags. This will send optimized write-zero requests > to the devices, without falling back to regular writes to zero the disk. > The benefit of this is that the disk will explicitly read back as zeros, > so a zero check is not necessary. The down side is that not all devices > implement this in as optimal a way as the discard request does and on > some of these devices zeroing can take multiple seconds per GB. REQ_OP_WRITE_ZEROES was explicitly designed for this use case. It will use discards if it is safe to do so. That is if the device supports deterministic zeroing; either explicitly through the storage protocol or through ATA quirks (thanks to the drive being vendor-qualified for RAID usage). > Because write-zero requests may be slow and most (but not all) discard > requests read back as zeros, this work uses only discard requests. REQ_OP_WRITE_ZEROES will pick the most optimal way to guarantee that all blocks in the requested range will return zeroes for subsequent reads. -- Martin K. Petersen Oracle Linux Engineering