Re: [PATCH mdadm v4 0/7] Write Zeroes option for Creating Arrays

Logan Gunthorpe <logang@xxxxxxxxxxxx> · Wed, 16 Nov 2022 10:11:17 -0700

Sorry a little late responding to this.

On 2022-10-12 19:33, Martin K. Petersen wrote:
> 
> Logan,
> 
>> 2) We could split up the fallocate call into multiple calls to zero
>> the entire disk. This would allow a quicker ctrl-c to occur, however
>> it's not clear what the best size would be to split it into. Even
>> zeroing 1GB can take a few seconds,
> 
> FWIW, we default to 32MB per request in SCSI unless the device
> explicitly advertises wanting something larger.
> 
>> (with NVMe, discard only requires a single command to handle the
>> entire disk
> 
> In NVMe there's a limit of 64K blocks per range and 256 ranges per
> request. So 8GB or 64GB per request for discard depending on the block
> size. So presumably it will take several operations to deallocate an
> entire drive.
> 
>> where as write-zeroes requires a minimum of one command per 2MB of
>> data to zero).
> 
> 32MB for 512-byte blocks and 256MB for 4096-byte blocks. Which matches
> how it currently works for SCSI devices.

The 2MB I was referring to was the typical maximum we see on real
devices. We tested a number of NVMe drives from a number of different
vendors and found most to be a maximum of 2MB, some devices had 512KB.
Which is unfortunate.

>> I was hoping write-zeroes could be made faster in the future, at least
>> for NVMe.
> 
> Deallocate had a bit of a head start and vendors are still catching up
> in the zeroing department. Some drives do support using Deallocate for
> zeroing and we quirk those in the driver so they should perform OK with
> your change.

Yeah, my hope is that larger zeroing requests can be supported which
will be handled performantly by deallocating the device. So I don't want
mdadm to slow this down by splitting the request to the kernel into a
number of smaller requests. But this seems to be the only way forward
because the request is uninterruptible and we don't want to hang the
user for several minutes.

Logan