At 11:02 PM 2/20/2009, Jordan Mendler wrote: >I am prototyping GlusterFS with ~50-60TB of raw disk space across >non-raided disks in ~30 compute nodes. I initially separated the >nodes into groups of two, and did a replicate across each set of >single drives in a pair of servers. Next I did a stripe across the >33 resulting AFR groups, with a block size of 1MB and later with the >default block size. With these configurations I am only seeing >throughput of about 15-25 MB/s, despite a full Gig-E network. > >What is generally the recommended configuration in a large striped >environment? I am wondering if the number of nodes in the stripe is >causing too much overhead, or if the bottleneck is likely somewhere >else. In addition, I saw a thread on the list that indicates it is >better to replicate across stripes rather than stripe across >replicates. Does anyone have any comments or opinion regarding this? I think that's all guesswork, I'm not sure anyones done a thorough test with gluster 2.0 on those choices. Personally, from a data management perspective, I'd rather replicate then stripe, so that I know that each node in a replica has exactly the same data. With striping then replicating, I imagine there is the possibility to have some data that's on one node in one stripe set on 2 nodes in another stripe set and this causes a problem if you have to take it apart or deal with it later. However, if you have the time, it'd be great to see results of you testing with a 15 node stripe and a 10 node stripe to see how those numbers rate vs. the 30 node stripe you have now. then, flip the replication and do the same tests again. Keith