Re: [PATCH mdadm v4 0/7] Write Zeroes option for Creating Arrays

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Logan

I did a test with the patchset. There is a problem like this:

mdadm -CR /dev/md0 -l5 -n3 /dev/nvme1n1 /dev/nvme2n1 /dev/nvme0n1 --write-zero
mdadm: zeroing data from 135266304 to 960061505536 on: /dev/nvme1n1
mdadm: zeroing data from 135266304 to 960061505536 on: /dev/nvme2n1
mdadm: zeroing data from 135266304 to 960061505536 on: /dev/nvme0n1

I ran ctrl+c when waiting, then the raid can't be created anymore. Because the
processes that write zero to nvmes are stuck.

ps auxf | grep mdadm
root       68764  0.0  0.0   9216  1104 pts/0    S+   21:09   0:00
         \_ grep --color=auto mdadm
root       68633  0.1  0.0  27808   336 pts/0    D    21:04   0:00
mdadm -CR /dev/md0 -l5 -n3 /dev/nvme1n1 /dev/nvme2n1 /dev/nvme0n1
--write-zero
root       68634  0.2  0.0  27808   336 pts/0    D    21:04   0:00
mdadm -CR /dev/md0 -l5 -n3 /dev/nvme1n1 /dev/nvme2n1 /dev/nvme0n1
--write-zero
root       68635  0.0  0.0  27808   336 pts/0    D    21:04   0:00
mdadm -CR /dev/md0 -l5 -n3 /dev/nvme1n1 /dev/nvme2n1 /dev/nvme0n1
--write-zero

Regards
Xiao

On Sat, Oct 8, 2022 at 4:10 AM Logan Gunthorpe <logang@xxxxxxxxxxxx> wrote:
>
> Hi,
>
> This is the next iteration of the patchset that added the discard
> option to mdadm. Per feedback from Martin, it's more desirable
> to use the write-zeroes functionality than rely on devices to zero
> the data on a discard request. This is because standards typically
> only require the device to do the best effort to discard data and
> may not actually discard (and thus zero) it all in some circumstances.
>
> This version of the patch set adds the --write-zeroes option which
> will imply --assume-clean and write zeros to the data region in
> each disk before starting the array. This can take some time so
> each disk is done in parallel in its own fork. To make the forking
> code easier to understand this patch set also starts with some
> cleanup of the existing Create code.
>
> We tested write-zeroes requests on a number of modern nvme drives of
> various manufacturers and found most are not as optimized as the
> discard path. A couple drives that were tested did not support
> write-zeroes at all but still performed similarly with the kernel
> falling back to writing zero pages. Typically we see it take on the
> order of one minute per 100GB of data zeroed.
>
> One reason write-zeroes is slower than discard is that today's NVMe
> devices only allow about 2MB to be zeroed in one command where as
> the entire drive can typically be discarded in one command. Partly,
> this is a limitation of the spec as there are only 16 bits avalaible
> in the write-zeros command size but drives still don't max this out.
> Hopefully, in the future this will all be optimized a bit more
> and this work will be able to take advantage of that.
>
> Logan
>
> --
>
> Changes since v3:
>    * Store the pid in a local variable instead of the mdinfo struct
>     (per Mariusz and Xiao)
>
> Changes since v2:
>
>    * Use write-zeroes instead of discard to zero the disks (per
>      Martin)
>    * Due to the time required to zero the disks, each disk is
>      now done in parallel with separate forks of the process.
>    * In order to add the forking some refactoring was done on the
>      Create() function to make it easier to understand
>    * Added a pr_info() call so that some prints can be done
>      to stdout instead of stdour (per Mariusz)
>    * Added KIB_TO_BYTES and SEC_TO_BYTES helpers (per Mariusz)
>    * Added a test to the mdadm test suite to test the option
>      works.
>    * Fixed up how the size and offset are calculated with some
>      great information from Xiao.
>
> Changes since v1:
>
>    * Discard the data in the devices later in the create process
>      while they are already open. This requires treating the
>      s.discard option the same as the s.assume_clean option.
>      Per Mariusz.
>    * A couple other minor cleanup changes from Mariusz.
>
>
> *** BLURB HERE ***
>
> Logan Gunthorpe (7):
>   Create: goto abort_locked instead of return 1 in error path
>   Create: remove safe_mode_delay local variable
>   Create: Factor out add_disks() helpers
>   mdadm: Introduce pr_info()
>   mdadm: Add --write-zeros option for Create
>   tests/00raid5-zero: Introduce test to exercise --write-zeros.
>   manpage: Add --write-zeroes option to manpage
>
>  Create.c           | 479 ++++++++++++++++++++++++++++-----------------
>  ReadMe.c           |   2 +
>  mdadm.8.in         |  16 ++
>  mdadm.c            |   9 +
>  mdadm.h            |   7 +
>  tests/00raid5-zero |  12 ++
>  6 files changed, 350 insertions(+), 175 deletions(-)
>  create mode 100644 tests/00raid5-zero
>
>
> base-commit: 8b668d4aa3305af5963162b7499b128bd71f8f29
> --
> 2.30.2
>




[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux