Re: RAID 5 doesn't scale

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 4/3/2013 1:23 PM, Martin Wilck wrote:
> On 04/03/2013 03:18 PM, Stan Hoeppner wrote:
> 
>> You didn't mention your stripe_cache_size value.  It'll make a lot of
>> difference.  Make sure it's at least 4096.  The default is 256.

Actually, the default is 128, not 256, at least with 3.2.6.  Not sure
about previous/later versions.

> I'm not getting it - why would stripe cache size matter in a random
> read/write test? 

It's very similar to the effect of a greater quantity of write back
cache on a hardware RAID controller.  Which is why it dramatically
affects write throughput but not read.  I believe the proper way to view
this is as a temporary workspace, where md can assemble the stripes to
be written out to the block layer, and store chunks which are read in
for RMW cycles.  As with many things in computing, increasing the size
of this working space allows the md driver to work more efficiently.
See below for exactly how it works.

> If the disks are large enough and the pattern is really
> random, the cache should hardly ever be hit (s_c_z = 4096 =^ 16MB cache
> per disk, that's 0.01% of disk size for a 160GB SSD).

You seem to be assuming the md "stripe cache" functions like some kind
of generic dumb filesystem cache.  It does not.

> I read that Peter confirmed the influence of stripe_cache_size, but I'd
> like to understand why it matters in this case.

If you think the throughput increase in this thread is impressive, see:
 http://marc.info/?l=linux-raid&m=136241443706663&w=2

About half way down there is a table showing the effects of
stripe_cache_size from 2048 to 32768.  Write throughput increased over
600MB/s, from 1018MB/s to 1628MB/s, simply by increasing
stripe_cache_size from 2048 to 4096, and decreased as the stripe cache
was made larger.  Thus every system has a sweet spot.  This was with 5
Intel 500GB SSDs w/the SandForce 2281 controller, attached to an LSI
9207-8i.  md/RAID5

I'd love to explain exactly how the stripe cache works, but to do that I
must first understand it.  And I've been unable to find documentation
describing the inner workings of the stripe cache.  And since I'm
neither a C nor kernel programmer, I can't look at the code and
understand it, nor then write a document for others.  So if you really
want that explanation you'll need to start another thread and bribe Neil
into explaining it.

-- 
Stan

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux