Hi Logan I did a test with the patchset. There is a problem like this: mdadm -CR /dev/md0 -l5 -n3 /dev/nvme1n1 /dev/nvme2n1 /dev/nvme0n1 --write-zero mdadm: zeroing data from 135266304 to 960061505536 on: /dev/nvme1n1 mdadm: zeroing data from 135266304 to 960061505536 on: /dev/nvme2n1 mdadm: zeroing data from 135266304 to 960061505536 on: /dev/nvme0n1 I ran ctrl+c when waiting, then the raid can't be created anymore. Because the processes that write zero to nvmes are stuck. ps auxf | grep mdadm root 68764 0.0 0.0 9216 1104 pts/0 S+ 21:09 0:00 \_ grep --color=auto mdadm root 68633 0.1 0.0 27808 336 pts/0 D 21:04 0:00 mdadm -CR /dev/md0 -l5 -n3 /dev/nvme1n1 /dev/nvme2n1 /dev/nvme0n1 --write-zero root 68634 0.2 0.0 27808 336 pts/0 D 21:04 0:00 mdadm -CR /dev/md0 -l5 -n3 /dev/nvme1n1 /dev/nvme2n1 /dev/nvme0n1 --write-zero root 68635 0.0 0.0 27808 336 pts/0 D 21:04 0:00 mdadm -CR /dev/md0 -l5 -n3 /dev/nvme1n1 /dev/nvme2n1 /dev/nvme0n1 --write-zero Regards Xiao On Sat, Oct 8, 2022 at 4:10 AM Logan Gunthorpe <logang@xxxxxxxxxxxx> wrote: > > Hi, > > This is the next iteration of the patchset that added the discard > option to mdadm. Per feedback from Martin, it's more desirable > to use the write-zeroes functionality than rely on devices to zero > the data on a discard request. This is because standards typically > only require the device to do the best effort to discard data and > may not actually discard (and thus zero) it all in some circumstances. > > This version of the patch set adds the --write-zeroes option which > will imply --assume-clean and write zeros to the data region in > each disk before starting the array. This can take some time so > each disk is done in parallel in its own fork. To make the forking > code easier to understand this patch set also starts with some > cleanup of the existing Create code. > > We tested write-zeroes requests on a number of modern nvme drives of > various manufacturers and found most are not as optimized as the > discard path. A couple drives that were tested did not support > write-zeroes at all but still performed similarly with the kernel > falling back to writing zero pages. Typically we see it take on the > order of one minute per 100GB of data zeroed. > > One reason write-zeroes is slower than discard is that today's NVMe > devices only allow about 2MB to be zeroed in one command where as > the entire drive can typically be discarded in one command. Partly, > this is a limitation of the spec as there are only 16 bits avalaible > in the write-zeros command size but drives still don't max this out. > Hopefully, in the future this will all be optimized a bit more > and this work will be able to take advantage of that. > > Logan > > -- > > Changes since v3: > * Store the pid in a local variable instead of the mdinfo struct > (per Mariusz and Xiao) > > Changes since v2: > > * Use write-zeroes instead of discard to zero the disks (per > Martin) > * Due to the time required to zero the disks, each disk is > now done in parallel with separate forks of the process. > * In order to add the forking some refactoring was done on the > Create() function to make it easier to understand > * Added a pr_info() call so that some prints can be done > to stdout instead of stdour (per Mariusz) > * Added KIB_TO_BYTES and SEC_TO_BYTES helpers (per Mariusz) > * Added a test to the mdadm test suite to test the option > works. > * Fixed up how the size and offset are calculated with some > great information from Xiao. > > Changes since v1: > > * Discard the data in the devices later in the create process > while they are already open. This requires treating the > s.discard option the same as the s.assume_clean option. > Per Mariusz. > * A couple other minor cleanup changes from Mariusz. > > > *** BLURB HERE *** > > Logan Gunthorpe (7): > Create: goto abort_locked instead of return 1 in error path > Create: remove safe_mode_delay local variable > Create: Factor out add_disks() helpers > mdadm: Introduce pr_info() > mdadm: Add --write-zeros option for Create > tests/00raid5-zero: Introduce test to exercise --write-zeros. > manpage: Add --write-zeroes option to manpage > > Create.c | 479 ++++++++++++++++++++++++++++----------------- > ReadMe.c | 2 + > mdadm.8.in | 16 ++ > mdadm.c | 9 + > mdadm.h | 7 + > tests/00raid5-zero | 12 ++ > 6 files changed, 350 insertions(+), 175 deletions(-) > create mode 100644 tests/00raid5-zero > > > base-commit: 8b668d4aa3305af5963162b7499b128bd71f8f29 > -- > 2.30.2 >