On Tue, 19 Aug 2014 16:36:20 +0000 Markus Stockhausen <stockhausen@xxxxxxxxxxx> wrote: > v2: reordering and merging of patches as Neil requested. More > verification & benchmark numbers > > Once again thanks to an older patch from Kumar Sundararajan and > Dan Williams that helped me to understand RAID6 logic inside md > better. Everything is based on ideas & discussions that started > with http://marc.info/?l=linux-raid&m=136624783417452&w=1 > > Another try to implement RMW support for RAID6. This time improve > syndrome calculation too. A few things to note: > > 1) Patches are based on official 3.16 kernel git. > > 2) The required optimized syndrome functions were implemented if > possible. Generic & SSE2 are the ones that I could write & test > on my machine. If you want to test/benchmark this patch ensure > that you force select one of the two. Programmers with appropriate > hardware in their hands are encouraged to send the missing > algorithms. > > 3) raid6 test program was enhanced to verify algorithm correctness. > Additionaly this release was checked with a self written single > threaded test tool I called wprd (write predictable random data). > Checked features include raid expansion, rebuild of failed drives, > different RAID6 geometries, ... /dev/md0 and the underlying block > devices contents were checked with sha256sum against an expected > result of the unpatched module. Knock on wood so far no failures. > > 4) In between I was able to grab 10 older disk drives of different > sizes and speeds and built a test rig. Simple RAID math should > give 3read+3write I/Os for RMW and 7read+3write I/Os for RCW and > thus a 66% improvement for write I/Os with a size smaller or > equal to a single chunk. As you can see reality does not care > about math but the effect is visible. Remember that larger arrays > will show more speedups. > > 300 seconds random write with 8 threads > 3,2TB (10*400GB) RAID6 64K chunk without spare > group_thread_cnt=4 > > bsize rmw_level=1 rmw_level=0 rmw_level=1 rmw_level=0 > skip_copy=1 skip_copy=1 skip_copy=0 skip_copy=0 > 4K 115 KB/s 141 KB/s 165 KB/s 140 KB/s > 8K 225 KB/s 275 KB/s 324 KB/s 274 KB/s > 16K 434 KB/s 536 KB/s 640 KB/s 534 KB/s > 32K 751 KB/s 1,051 KB/s 1,234 KB/s 1,045 KB/s > 64K 1,339 KB/s 1,958 KB/s 2,282 KB/s 1,962 KB/s > 128K 2,673 KB/s 3,862 KB/s 4,113 KB/s 3,898 KB/s > 256K 7,685 KB/s 7,539 KB/s 7,557 KB/s 7,638 KB/s > 512K 19,556 KB/s 19,558 KB/s 19,652 KB/s 19,688 Kb/s > > Thanks Neil for your support. > > Markus > Thanks. This looks a lot nicer. If you resend them with the formatting a s-o-b changes I mentioned I'll put them in my try and try to do some testing and look at bits more closely. Two things: 1/ I would be good to have performance numbers in the patch description for the patch that makes it all work. Put yours there now, we can add other later.. 2/ When you do a partial syndrome you specify a start and end. Is that really what is always wanted, or what is easiest? I imaging that you might want to "add" or "subtract" an arbitrary subset of blocks. I imaging that the "blocks" array of pointers that is passing could have NULLs for the blocks to ignore and would include all others in the computation. Is there a good reason for not doing that. Apart from the the code looks quite nice and clean .... though I don't seem to be concentrating at my best today so I reserve the right to revise that assessment at a later date :-) Thanks, NeilBrown
Attachment:
signature.asc
Description: PGP signature