Re: [PATCH mdadm v6 0/7] Write Zeroes option for Creating Arrays

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Nov 24, 2022 at 3:10 AM Logan Gunthorpe <logang@xxxxxxxxxxxx> wrote:
>
> Hi,
>
> This is the next iteration of the patchset to add a zeroing option
> which bypasses the inital sync for arrays. This version of the patch
> set fixes an unitialized variable bug in v5.
>
> This patch set adds the --write-zeroes option which will imply
> --assume-clean and write zeros to the data region in each disk before
> starting the array. This can take some time so each disk is done in
> parallel in its own fork. To make the forking code easier to
> understand this patch set also starts with some cleanup of the
> existing Create code.
>
> We tested write-zeroes requests on a number of modern nvme drives of
> various manufacturers and found most are not as optimized as the
> discard path. A couple drives that were tested did not support
> write-zeroes at all but still performed similarly with the kernel
> falling back to writing zero pages. Typically we see it take on the
> order of one minute per 100GB of data zeroed.
>
> One reason write-zeroes is slower than discard is that today's NVMe
> devices only allow about 2MB to be zeroed in one command where as
> the entire drive can typically be discarded in one command. Partly,
> this is a limitation of the spec as there are only 16 bits avalaible
> in the write-zeros command size but drives still don't max this out.
> Hopefully, in the future this will all be optimized a bit more
> and this work will be able to take advantage of that.
>
> Logan
>
> --
>
> Changes since v5:
>    * Ensure 'interrupted' is initialized in wait_for_zero_forks().
>      (as noticed by Xiao)
>    * Print a message indicating that the zeroing was interrupted.
>
> Changes since v4:
>    * Handle SIGINT better. Previous versions would leave the zeroing
>      processes behind after the main thread exitted which would
>      continue zeroing in the background (possibly for some time).
>      This version splits the zero fallocate commands up so they can be
>      interrupted quicker, and intercepts SIGINT in the main thread
>      to print an appropriate message and wait for the threads
>      to finish up. (as noticed by Xiao)
>
> Changes since v3:
>    * Store the pid in a local variable instead of the mdinfo struct
>     (per Mariusz and Xiao)
>
> Changes since v2:
>
>    * Use write-zeroes instead of discard to zero the disks (per
>      Martin)
>    * Due to the time required to zero the disks, each disk is
>      now done in parallel with separate forks of the process.
>    * In order to add the forking some refactoring was done on the
>      Create() function to make it easier to understand
>    * Added a pr_info() call so that some prints can be done
>      to stdout instead of stdour (per Mariusz)
>    * Added KIB_TO_BYTES and SEC_TO_BYTES helpers (per Mariusz)
>    * Added a test to the mdadm test suite to test the option
>      works.
>    * Fixed up how the size and offset are calculated with some
>      great information from Xiao.
>
> Changes since v1:
>
>    * Discard the data in the devices later in the create process
>      while they are already open. This requires treating the
>      s.discard option the same as the s.assume_clean option.
>      Per Mariusz.
>    * A couple other minor cleanup changes from Mariusz.
>
> --
>
> Logan Gunthorpe (7):
>   Create: goto abort_locked instead of return 1 in error path
>   Create: remove safe_mode_delay local variable
>   Create: Factor out add_disks() helpers
>   mdadm: Introduce pr_info()
>   mdadm: Add --write-zeros option for Create
>   tests/00raid5-zero: Introduce test to exercise --write-zeros.
>   manpage: Add --write-zeroes option to manpage
>
>  Create.c           | 564 +++++++++++++++++++++++++++++++--------------
>  ReadMe.c           |   2 +
>  mdadm.8.in         |  16 ++
>  mdadm.c            |   9 +
>  mdadm.h            |   7 +
>  tests/00raid5-zero |  12 +
>  6 files changed, 435 insertions(+), 175 deletions(-)
>  create mode 100644 tests/00raid5-zero
>
>
> base-commit: 8b668d4aa3305af5963162b7499b128bd71f8f29
> --
> 2.30.2
>

For the series, reviewed-by Xiao Ni <xni@xxxxxxxxxx>




[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux