Re: [PATCH mdadm v7 0/7] Write Zeroes option for Creating Arrays

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 3/1/23 15:41, Logan Gunthorpe wrote:
> Hi,
> 
> This is the next iteration of the patchset to add a zeroing option
> which bypasses the inital sync for arrays. This version of the patch
> has some minor cleanup and collected a number of review and ack tags.
> 
> This patch set adds the --write-zeroes option which will imply
> --assume-clean and write zeros to the data region in each disk before
> starting the array. This can take some time so each disk is done in
> parallel in its own fork. To make the forking code easier to
> understand this patch set also starts with some cleanup of the
> existing Create code.
> 
> We tested write-zeroes requests on a number of modern nvme drives of
> various manufacturers and found most are not as optimized as the
> discard path. A couple drives that were tested did not support
> write-zeroes at all but still performed similarly with the kernel
> falling back to writing zero pages. Typically we see it take on the
> order of one minute per 100GB of data zeroed.
> 
> One reason write-zeroes is slower than discard is that today's NVMe
> devices only allow about 2MB to be zeroed in one command where as
> the entire drive can typically be discarded in one command. Partly,
> this is a limitation of the spec as there are only 16 bits avalaible
> in the write-zeros command size but drives still don't max this out.
> Hopefully, in the future this will all be optimized a bit more
> and this work will be able to take advantage of that.
> 
> Logan
> 
> --
> 
> Changes since v6:
>    * Collected review and ack tags from Xiao, Chaitanya and Coly
>    * Adjust the error reporting to us strerror() instead of the
>      glibc %m extension. (per Coly)
>    * Fix a typo in the man page ("despit" should have been "despite")
>      (as noticed by Coly)
> 
> Changes since v5:
>    * Ensure 'interrupted' is initialized in wait_for_zero_forks().
>      (as noticed by Xiao)
>    * Print a message indicating that the zeroing was interrupted.
> 
> Changes since v4:
>    * Handle SIGINT better. Previous versions would leave the zeroing
>      processes behind after the main thread exitted which would
>      continue zeroing in the background (possibly for some time).
>      This version splits the zero fallocate commands up so they can be
>      interrupted quicker, and intercepts SIGINT in the main thread
>      to print an appropriate message and wait for the threads
>      to finish up. (as noticed by Xiao)
> 
> Changes since v3:
>    * Store the pid in a local variable instead of the mdinfo struct
>     (per Mariusz and Xiao)
> 
> Changes since v2:
> 
>    * Use write-zeroes instead of discard to zero the disks (per
>      Martin)
>    * Due to the time required to zero the disks, each disk is
>      now done in parallel with separate forks of the process.
>    * In order to add the forking some refactoring was done on the
>      Create() function to make it easier to understand
>    * Added a pr_info() call so that some prints can be done
>      to stdout instead of stdour (per Mariusz)
>    * Added KIB_TO_BYTES and SEC_TO_BYTES helpers (per Mariusz)
>    * Added a test to the mdadm test suite to test the option
>      works.
>    * Fixed up how the size and offset are calculated with some
>      great information from Xiao.
> 
> Changes since v1:
> 
>    * Discard the data in the devices later in the create process
>      while they are already open. This requires treating the
>      s.discard option the same as the s.assume_clean option.
>      Per Mariusz.
>    * A couple other minor cleanup changes from Mariusz.
> 
> --
> 
> Logan Gunthorpe (7):
>   Create: goto abort_locked instead of return 1 in error path
>   Create: remove safe_mode_delay local variable
>   Create: Factor out add_disks() helpers
>   mdadm: Introduce pr_info()
>   mdadm: Add --write-zeros option for Create
>   tests/00raid5-zero: Introduce test to exercise --write-zeros.
>   manpage: Add --write-zeroes option to manpage
> 
>  Create.c           | 565 +++++++++++++++++++++++++++++++--------------
>  ReadMe.c           |   2 +
>  mdadm.8.in         |  18 +-
>  mdadm.c            |   9 +
>  mdadm.h            |   7 +
>  tests/00raid5-zero |  12 +
>  6 files changed, 437 insertions(+), 176 deletions(-)
>  create mode 100644 tests/00raid5-zero
> 
> 
> base-commit: f1f3ef7d2de5e3a726c27b9f9bb20e270a100dab

All applied!

Thanks,
Jes





[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux