Re: RAID6 : Sequential Write Performance

Song Liu <liu.song.a23@xxxxxxxxx> · Fri, 15 Feb 2019 15:15:21 -0800

On Fri, Feb 15, 2019 at 8:36 AM Roy Sigurd Karlsbakk <roy@xxxxxxxxxxxxx> wrote:
>
> >> Greetings !
> >>
> >> I created a MD RAID6 with a 512KiB chunk size out of 12 8TB drives, no internal
> >> bitmap and no journal on quad xeon gold 6154 running kernel 4.18 (Ubuntu
> >> 18.04.1) and set FIO to do a 1TiB sequential write to the device with a block
> >> size of 5M, 3 processes and a QD of 64.

Why using 3 processes?

> >>
> >> Each drive being able to achieve 215MiB/s at the beginning of the drive, I
> >> expected the output to be somewhere around the 2GiB/s mark at the beginning of
> >> the raid array.
> >> After setting stripe_cache_size to 32768 and group_thread_cnt to 2, I only got
> >> an average 1.4GiB/s out of my array and the throughput wasn't very stable.

Bigger stripe_cache_size may not always give better performance. Same for
group_thread_cnt. Some more tuning may give better performance.

> >>
> >> I did the same test against a hardware raid controller, the Broadcom MegaRAID
> >> 9480-8i8e, and it managed a nice flat 1.9 GiB/s.
> >>
> >> I expected a modern cpu to easily win over a hardware controller but that wasn't
> >> the case.
> >> Am I missing something ?
> >
> > At a wag... the 4GB ram cache on the raid card causing it to appear as
> > if the disk access is faster?
> >
> > I have to be honest, I've long since given up trying to test the
> > performance of raid formats/layouts/chunks/etc... due to the multiple
> > ways the system can "do stuff" that changes the results with even the
> > exact same manual style tests. Then again, my workloads tend to be "good
> > enough, is good enough". I guess, however, someone needing a high speed
> > file server bonded 10Gb links to multiple workstations running video
> > file editing software would be a whole different ballgame.
>
> Well, something is bound to be wrong here when a RAID card is faster than using a far faster CPU for the work, with faster memory etc. Does anyone know how this can be debugged or fixed? Is there a possibility to choose which to use from SSE/AVX?

I think the kernel will choose best instruction of SSE/AVX. dmesg will show
something like

[    0.233184] raid6: sse2x1   gen()  8003 MB/s
[    0.250192] raid6: sse2x1   xor()  5982 MB/s
[    0.267208] raid6: sse2x2   gen() 10003 MB/s
[    0.284227] raid6: sse2x2   xor()  6937 MB/s
[    0.301242] raid6: sse2x4   gen() 12187 MB/s
[    0.318260] raid6: sse2x4   xor()  8029 MB/s
[    0.318427] raid6: using algorithm sse2x4 gen() 12187 MB/s
[    0.318639] raid6: .... xor() 8029 MB/s, rmw enabled
[    0.318833] raid6: using ssse3x2 recovery algorithm

If I am debugging this, I will first make sure the array is doing 100%
full stripe
writes (check read/write from iostat or similar tool).

>
> Vennlig hilsen
>
> roy
> --
> Roy Sigurd Karlsbakk
> (+47) 98013356
> http://blogg.karlsbakk.net/
> GPG Public key: http://karlsbakk.net/roysigurdkarlsbakk.pubkey.txt
> --
> Hið góða skaltu í stein höggva, hið illa í snjó rita.
>