Re: RAID6 : Sequential Write Performance

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Is this process going to be running all of the time, or just run for a
few minutes/hours at a time?   If for a few minutes at a time you may
want to use a raid0 ssd array for the data collection and then have
another process to move that data onto the raid6.    If the data is
split multiple files (a few gb each) then the backend process can move
the finished files just behind the main process and if the files are
small enough then they will still be in file cache and you won't have
to read off of the ssd array, and you will be isolated from random
blips on the spinning disks.  Give you are talking about 12 disks, if
one of those spinning disk blips it will be whatever timeout you set
on the disk assuming you have a disk that allows the timeout to be set
(the lowest I have found for that timeout is 0.1seconds).

On Tue, Feb 19, 2019 at 10:50 AM Jean De Gyns <Jean.DeGyns@xxxxxxxxxx> wrote:
>
> The drives are connected with a true sas hba, Broadcom HBA 9400-16e, and writing zeros directly to each drive with dd gives me ~ 12 * 200MiB/s for the first 100GiB.
>
> The only process that reaches 100% usage is md1_raid6 when group_thread_cnt is set to 0.
> With group_thread_cnt set to 2, md1_raid6 goes down to 10-15% and two workers with a 50-60% load appears.
>
> I'll use the raid as a realtime subvolume for XFS and will have only one sequential write (or read but that does work) up to 1.2 GiB/s so I think my fio profile fits the bill.
> The setup works fine with a raid0 but I'd really loved if it could work with a raid6 as well.
>
> JDG
>
> -----Original Message-----
> From: Roger Heflin [mailto:rogerheflin@xxxxxxxxx]
> Sent: samedi 16 février 2019 16:59
> To: Song Liu <liu.song.a23@xxxxxxxxx>
> Cc: Roy Sigurd Karlsbakk <roy@xxxxxxxxxxxxx>; Wilson Jonathan <i400sjon@xxxxxxxxx>; Jean De Gyns <Jean.DeGyns@xxxxxxxxxx>; Linux Raid <linux-raid@xxxxxxxxxxxxxxx>
> Subject: Re: RAID6 : Sequential Write Performance
>
> what kind of controller(s) are you using for the non-hardware raid case?
>
> You might also want to watch top and see if a cpu is maxing out on
> system/intrrupt time.   You can also use mpstat -P all <sampletime>
> <sampes> to see that.  I recently found an issue were even though the card I was testing had 8 interrupts, all 8 interrupts were on the same
> cpu and that was limiting my speed.    I got about 30% more iops once
> I figured out what was causing the interrupts to not be distributed and corrected it.
>
> You might want to match fio closely what what you think your real
> world is.   Last time I tested the hw raid I tested could write about
> 150% faster then software raid, but the software raid was 3x faster on multi-threaded reads.  And in my use case I wrote the data once and read it 50 or more times with multi-threads so the 3x reads was much more important than the read speed.
>
> On Sat, Feb 16, 2019 at 3:37 AM Song Liu <liu.song.a23@xxxxxxxxx> wrote:
> >
> > On Fri, Feb 15, 2019 at 8:36 AM Roy Sigurd Karlsbakk <roy@xxxxxxxxxxxxx> wrote:
> > >
> > > >> Greetings !
> > > >>
> > > >> I created a MD RAID6 with a 512KiB chunk size out of 12 8TB
> > > >> drives, no internal bitmap and no journal on quad xeon gold 6154
> > > >> running kernel 4.18 (Ubuntu
> > > >> 18.04.1) and set FIO to do a 1TiB sequential write to the device
> > > >> with a block size of 5M, 3 processes and a QD of 64.
> >
> > Why using 3 processes?
> >
> > > >>
> > > >> Each drive being able to achieve 215MiB/s at the beginning of the
> > > >> drive, I expected the output to be somewhere around the 2GiB/s
> > > >> mark at the beginning of the raid array.
> > > >> After setting stripe_cache_size to 32768 and group_thread_cnt to
> > > >> 2, I only got an average 1.4GiB/s out of my array and the throughput wasn't very stable.
> >
> > Bigger stripe_cache_size may not always give better performance. Same
> > for group_thread_cnt. Some more tuning may give better performance.
> >
> > > >>
> > > >> I did the same test against a hardware raid controller, the
> > > >> Broadcom MegaRAID 9480-8i8e, and it managed a nice flat 1.9 GiB/s.
> > > >>
> > > >> I expected a modern cpu to easily win over a hardware controller
> > > >> but that wasn't the case.
> > > >> Am I missing something ?
> > > >
> > > > At a wag... the 4GB ram cache on the raid card causing it to
> > > > appear as if the disk access is faster?
> > > >
> > > > I have to be honest, I've long since given up trying to test the
> > > > performance of raid formats/layouts/chunks/etc... due to the
> > > > multiple ways the system can "do stuff" that changes the results
> > > > with even the exact same manual style tests. Then again, my
> > > > workloads tend to be "good enough, is good enough". I guess,
> > > > however, someone needing a high speed file server bonded 10Gb
> > > > links to multiple workstations running video file editing software would be a whole different ballgame.
> > >
> > > Well, something is bound to be wrong here when a RAID card is faster than using a far faster CPU for the work, with faster memory etc. Does anyone know how this can be debugged or fixed? Is there a possibility to choose which to use from SSE/AVX?
> >
> > I think the kernel will choose best instruction of SSE/AVX. dmesg will
> > show something like
> >
> > [    0.233184] raid6: sse2x1   gen()  8003 MB/s
> > [    0.250192] raid6: sse2x1   xor()  5982 MB/s
> > [    0.267208] raid6: sse2x2   gen() 10003 MB/s
> > [    0.284227] raid6: sse2x2   xor()  6937 MB/s
> > [    0.301242] raid6: sse2x4   gen() 12187 MB/s
> > [    0.318260] raid6: sse2x4   xor()  8029 MB/s
> > [    0.318427] raid6: using algorithm sse2x4 gen() 12187 MB/s
> > [    0.318639] raid6: .... xor() 8029 MB/s, rmw enabled
> > [    0.318833] raid6: using ssse3x2 recovery algorithm
> >
> > If I am debugging this, I will first make sure the array is doing 100%
> > full stripe writes (check read/write from iostat or similar tool).
> >
> >
> >
> > >
> > > Vennlig hilsen
> > >
> > > roy
> > > --
> > > Roy Sigurd Karlsbakk
> > > (+47) 98013356
> > > http://blogg.karlsbakk.net/
> > > GPG Public key: http://karlsbakk.net/roysigurdkarlsbakk.pubkey.txt
> > > --
> > > Hið góða skaltu í stein höggva, hið illa í snjó rita.
> > >




[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux