Re: raind-1 resync speed slow down to 50% by the time it finishes

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Jul 31, 2009 at 01:10:37PM -0700, David Rees wrote:
> 2009/7/31 Keld Jørn Simonsen <keld@xxxxxxxx>:
> > On Thu, Jul 30, 2009 at 01:11:20PM -0700, David Rees wrote:
> >> 2009/7/30 Keld Jørn Simonsen <keld@xxxxxxxx>:
> >> > I think raid10,f2 only degrades 10-20 % while raid1 can degrade as much
> >> > as 50 %. For writing it is about the same, given that you use a file
> >> > system on top of the raid.
> >>
> >> Has anyone done any benchmarks of near vs far setups?
> >
> > Yes, there are a number of benchmarks on raid10 near/far scenarios
> > at http://linux-raid.osdl.org/index.php/Performance
> 
> Hmm, don't know how I missed those!  Thanks!
> 
> >> >From what I understand, here's how performance should go for a 2-disk
> >> raid10 setup:
> >>
> >> Streaming/large reads far: Up to 100% faster since reads are striped
> >> across both disks
> >
> > and possibly faster, due to far only using the faster half of the disk
> > for reading.
> 
> How is it possible to go faster than 2x faster?  If the system only
> reads from the disk that has the data on the faster half of the disk,
> you can't stripe the reads, so you won't see a significant increase in
> speed.
> 
> Let's use some data from a real disk, the Velociraptor and a 2-disk
> array and streaming reads/writes.  At the beginning of the disk you
> can read about 120MB/s.  At the end of the disk, you can read about
> 80MB/s.

This is not actual figures from some benchmarking you did, true?

> Data on the "beginning" of array, RAID0 = 240MB/s
> Data on the "end" of array, RAID0 = 160MB/s.
> Data on the "beginning" of array, RAID10,n2 = 120MB/s
> Data on the "end" of array, RAID10,n2 = 80MB/s.
> Data on the "beginning" of array, RAID10,f2 = 200MB/s

Should be:

Data on the "beginning" of array, RAID10,f2 = 230MB/s

You can get about 95 % of the theoretical max out of raid10,f2,
according to a number of tests.

> Data on the "end" of array, RAID10,f2 = 200MB/s.

Yes, the "end" here is at the half of the disk, so 200 MB/s is likely
for raid10,f2.

> With a f2 setup you'll read at something less than 120+80 = 200MB/s.

When? at the beginning or the end?

> So I guess it's a bit more than 100% faster than 80MB/s, in that
> situation, but you get less than the peak performance of 240MB/s, so
> I'd still call it on average, 100% faster.  It may be 120% faster in
> some situations, but only 80% faster in others.

I was talking of random read speeds, not sequential read speeds. 

Random read performance on a single disk, in one test, was 34 MB/s while
seq read on same disk was 82. In raid10,f2 with 2 disks random read was
79 MB/s. This is 235 % of the random read on one disk (34 MB/s). This is 
a likely result as you should expect a doubling in speed from the 2
disks, and then som additional speed from the faster sectors of the
outer disks, and then the shorter access times on the oyuter disk
sectors. Geometry says that on average the transfer speeds are 17 %
shorter on the outer half part of the disk, compared to the whole disk. 
So that gives some 235 % speed improvement (2 * 1.17). The head
improvement should also give a little, but maybe the elevator algorithm
of the file system eliminates most of that factor.


> >> Streaming/large writes far: Slower than single disk, since disks have
> >> to seek to write.  How much of a hit in performance will depend on
> >> chunk size.
> >> Streaming/large writes near: Same as single disk.
> >
> > Due to the elevator of the file system, writes are about the same for
> > both near and far.
> 
> Your benchmarks showed about a 13% performance hit for f2 compared to
> n2 and RAID1, so I wouldn't quite call it the same.  Close, but still
> noticeably slower.

Nah, for random write, MB/s:

raid1         55
raid10,n2     48
raid10,f2     55

So raid10,f2 and raid1 are the same, raid10,n2 13 % slower.
In theory the elevator should even this out for all mirrored raid types.
Single disk speed and raid1 and raid10,f2 speeds were identical, as
theory also would have it, for random writes.

> 
> >> Random/small reads far: Up to 100% faster
> >
> > Actually a bit more, due to that far only uses the fastest half of the
> > disks. One test shows 132 % faster, which is consistent with theory.
> 
> I doubt that is the case on average.  Best case, yes.  Worst case, no.
>  I guess I should have said "appx 100% faster" instead of Up to 100%
> faster.  So we're both right. :-)

I would claim that 132 % is consistent with theory, as explained above.
And as this is based on pure geometry, on a 3.5 " disk with standard
inner and outer radius, the figure is a general fixed result.

> >> 1. The array mostly sees write activity, streaming reads aren't that common.
> >> 2. I can only get about 120 MB/s out of the external enclosure because
> >> of the PCIe card [1] , so being able to stripe reads wouldn't help get
> >> any extra performance out of those disks.
> >> [1] http://ata.wiki.kernel.org/index.php/Hardware,_driver_status#Silicon_Image_3124
> >
> > Hmm, a pci-e x1 should be able to get 2.5 Mbit/s = about 300 MB/s.
> > Wikipedia says 250 MB/s. It is strange that you only can get 120 MB/s.
> > That is the speed of a PCI 32 bit bus. I looked at your reference [1]
> > for the 3132 model. Have you tried it out in practice?
> 
> Yes, in practice, IO reached exactly 120MB/s out of the controller.  I
> ran dd read/write tests on individual disks and found that overall
> throughput peaked exactly at 120MB/s.

Hmm, get another controller, then. A cheap PCIe contoller should be able
to do about 300 MB/s on a x1 PCIe.
> 
> > The max you should be able to get out of your raid10 with 8 disks would
> > then be around 400 - 480 MB/s, for sequential reads. 250 MB/s out of your PCIE
> > enclosure, or 50 MB/s per disk, and then additional 50 MB/s each of the last
> > 3 disks. You can only multiply the speed of the slowest of the disks
> > involved by the number of disks. But even then it is not so bad.
> > For random read it is better yet, given that this is not limited by the
> > transfer speed of your PCIe controller.
> 
> For streaming reads/writes, I have found am limited by the average
> speed of each disk the array.  Because I am limited to 120 MB/s on the
> 5-disk enclosure, for writes I'm limited to about 80 MB/s.  For reads
> which only have to come from half the disks, I am able to get up to
> 180 MB/s out of the array.

> (I did have to use blockdev --setra
> /dev/md0 to increase the readahead size to at least 16MB to get those
> numbers).

yes, this is a common trick.

> But the primary reason I built it was to handle lots of small random
> writes/reads, so being limited to 120MB/s out of the enclosure isn't
> noticeable most of the time in practice as you say.

Yes, for random read/write you only get something like 45 % out of the
max transfer bandwidth. So 120 MB/s would be close to the max that your
5 disks on the PCIe controller can deliver. With a faster PCIe
controller you should be able to get better performance on random reads
with raid10,f2. Anyway 180 MB/s may be fast enough for your application.

best regards
keld
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux