Recommended Stripe Width

freedman at FreeFormIT.com (Keith Freedman) · Fri, 20 Feb 2009 23:52:45 -0800

At 11:02 PM 2/20/2009, Jordan Mendler wrote:
>I am prototyping GlusterFS with ~50-60TB of raw disk space across 
>non-raided disks in ~30 compute nodes. I initially separated the 
>nodes into groups of two, and did a replicate across each set of 
>single drives in a pair of servers. Next I did a stripe across the 
>33 resulting AFR groups, with a block size of 1MB and later with the 
>default block size. With these configurations I am only seeing 
>throughput of about 15-25 MB/s, despite a full Gig-E network.
>
>What is generally the recommended configuration in a large striped 
>environment? I am wondering if the number of nodes in the stripe is 
>causing too much overhead, or if the bottleneck is likely somewhere 
>else. In addition, I saw a thread on the list that indicates it is 
>better to replicate across stripes rather than stripe across 
>replicates. Does anyone have any comments or opinion regarding this?

I think that's all guesswork, I'm not sure anyones done a thorough 
test with gluster 2.0 on those choices.
Personally, from a data management perspective, I'd rather replicate 
then stripe, so that I know that each node in a replica has exactly 
the same data.  With striping then replicating, I imagine there is 
the possibility to have some data that's on one node in one stripe 
set on 2 nodes in another stripe set and this causes a problem if you 
have to take it apart or deal with it later.

However, if you have the time, it'd be great to see results of you 
testing with a 15 node stripe and a 10 node stripe to see how those 
numbers rate vs. the 30 node stripe you have now.
then, flip the replication and do the same tests again.

Keith