Sorry a little late responding to this. On 2022-10-12 19:33, Martin K. Petersen wrote: > > Logan, > >> 2) We could split up the fallocate call into multiple calls to zero >> the entire disk. This would allow a quicker ctrl-c to occur, however >> it's not clear what the best size would be to split it into. Even >> zeroing 1GB can take a few seconds, > > FWIW, we default to 32MB per request in SCSI unless the device > explicitly advertises wanting something larger. > >> (with NVMe, discard only requires a single command to handle the >> entire disk > > In NVMe there's a limit of 64K blocks per range and 256 ranges per > request. So 8GB or 64GB per request for discard depending on the block > size. So presumably it will take several operations to deallocate an > entire drive. > >> where as write-zeroes requires a minimum of one command per 2MB of >> data to zero). > > 32MB for 512-byte blocks and 256MB for 4096-byte blocks. Which matches > how it currently works for SCSI devices. The 2MB I was referring to was the typical maximum we see on real devices. We tested a number of NVMe drives from a number of different vendors and found most to be a maximum of 2MB, some devices had 512KB. Which is unfortunate. >> I was hoping write-zeroes could be made faster in the future, at least >> for NVMe. > > Deallocate had a bit of a head start and vendors are still catching up > in the zeroing department. Some drives do support using Deallocate for > zeroing and we quirk those in the driver so they should perform OK with > your change. Yeah, my hope is that larger zeroing requests can be supported which will be handled performantly by deallocating the device. So I don't want mdadm to slow this down by splitting the request to the kernel into a number of smaller requests. But this seems to be the only way forward because the request is uninterruptible and we don't want to hang the user for several minutes. Logan