Re: [PATCH v2 0/6] raid6: support read-modify-write

NeilBrown <neilb@xxxxxxx> · Thu, 21 Aug 2014 14:58:08 +1000

On Tue, 19 Aug 2014 16:36:20 +0000 Markus Stockhausen
<stockhausen@xxxxxxxxxxx> wrote:

> v2: reordering and merging of patches as Neil requested. More
> verification & benchmark numbers
> 
> Once again thanks to an older patch from Kumar Sundararajan and 
> Dan Williams that helped me to understand RAID6 logic inside md 
> better. Everything is based on ideas & discussions that started
> with http://marc.info/?l=linux-raid&m=136624783417452&w=1
> 
> Another try to implement RMW support for RAID6. This time improve
> syndrome calculation too. A few things to note:
> 
> 1) Patches are based on official 3.16 kernel git.
> 
> 2) The required optimized syndrome functions were implemented if
> possible. Generic & SSE2 are the ones that I could write & test
> on my machine. If you want to test/benchmark this patch ensure
> that you force select one of the two. Programmers with appropriate
> hardware in their hands are encouraged to send the missing 
> algorithms.
> 
> 3) raid6 test program was enhanced to verify algorithm correctness. 
> Additionaly this release was checked with a self written single 
> threaded test tool I called wprd (write predictable random data).
> Checked features include raid expansion, rebuild of failed drives, 
> different RAID6 geometries, ... /dev/md0 and the underlying block
> devices contents were checked with sha256sum against an expected 
> result of the unpatched module. Knock on wood so far no failures.
> 
> 4) In between I was able to grab 10 older disk drives of different 
> sizes and speeds and built a test rig. Simple RAID math should
> give 3read+3write I/Os for RMW and 7read+3write I/Os for RCW and
> thus a 66% improvement for write I/Os with a size smaller or
> equal to a single chunk. As you can see reality does not care 
> about math but the effect is visible. Remember that larger arrays
> will show more speedups.
> 
> 300 seconds random write with 8 threads
> 3,2TB (10*400GB) RAID6 64K chunk without spare 
> group_thread_cnt=4
> 
> bsize   rmw_level=1   rmw_level=0   rmw_level=1   rmw_level=0
>         skip_copy=1   skip_copy=1   skip_copy=0   skip_copy=0
>    4K      115 KB/s      141 KB/s      165 KB/s      140 KB/s
>    8K      225 KB/s      275 KB/s      324 KB/s      274 KB/s
>   16K      434 KB/s      536 KB/s      640 KB/s      534 KB/s
>   32K      751 KB/s    1,051 KB/s    1,234 KB/s    1,045 KB/s
>   64K    1,339 KB/s    1,958 KB/s    2,282 KB/s    1,962 KB/s
>  128K    2,673 KB/s    3,862 KB/s    4,113 KB/s    3,898 KB/s
>  256K    7,685 KB/s    7,539 KB/s    7,557 KB/s    7,638 KB/s
>  512K   19,556 KB/s   19,558 KB/s   19,652 KB/s   19,688 Kb/s
> 
> Thanks Neil for your support.
> 
> Markus
> 

Thanks.  This looks a lot nicer.  If you resend them with the formatting a
s-o-b changes I mentioned I'll put them in my try and try to do some testing
and look at bits more closely.
Two things:
1/ I would be good to have performance numbers in the patch description for
   the patch that makes it all work.  Put yours there now, we can add other
   later..

2/ When you do a partial syndrome you specify a start and end.
   Is that really what is always wanted, or what is easiest?
   I imaging that you might want to "add" or "subtract" an arbitrary subset
   of blocks.  I imaging that the "blocks" array of pointers that is
   passing could have NULLs for the blocks to ignore and would include
   all others in the computation.
   Is there a good reason for not doing that.

Apart from the the code looks quite nice and clean .... though I don't seem
to be concentrating at my best today so I reserve the right to revise that
assessment at a later date :-)

Thanks,
NeilBrown
Attachment:
signature.asc

Description: PGP signature