any configuration guidelines?

wdong.pku at gmail.com (Wei Dong) · Thu, 30 Jul 2009 08:39:01 -0400

Thanks a lot for your insightful reply; it really clarifies a lot of 
things.  I think the DHT-AFR-DHT configuration makes a lot of sense.

- Wei Dong

Harald St?rzebecher wrote:
> Hi,
>
> 2009/7/28 Wei Dong <wdong.pku at gmail.com>:
>   
>> Hi All,
>>
>> We've been using GlusterFS 2.0.1 on our lab cluster to host a large number
>> of small images for distributed processing with Hadoop and it has been
>> working fine without human intervention for a couple of months.  Thanks for
>> the wonderful project -- it's the only freely available cluster filesystem
>> that fits our needs.
>>
>> What keeps bothering me is the extremely high flexibility of ClusterFS.
>>  There's simply so many ways to achieve the same goal that I don't know
>> which is the best.  So I'm writing to ask if there are some general
>> guidelines of configuration to improve both data safety and performance.
>>     
>
> AFAIK, there are some general guidelines in the GlusterFS documentation.
> IMHO, sometimes it takes careful reading or some experimentation to find them.
> Some examples have been discussed on the mailing list.
>
>   
>> Specifically, we have 66 machines (in two racks) with 4 x 1.5TB disks /
>> machine.  We want to aggregate all the available disk space into a single
>> shared directory with 3 replications..  Following are some of the potential
>> configurations.
>>
>> *  Each node exports 4 directories, so there are 66x4 = 264 directories to
>> the client.  We then first group those directories into threes with AFR,
>> making 88 replicated directories, and then aggregate them with DHT.  When
>> configuring AFR, we can either make the three replicates on different
>> machines, or two on the same machine and the third on another machine.
>>     
>
> I'd put the three replicates on three different machines - three
> machines are less likely to fail than just two.
>
> One setup on my list of setups to evaluate would be a DHT - AFR - DHT
> configuration.
> - aggregate the four disks on each server to a single volume, export
> only that volume
> - on the clients, group those 66 volumes into threes with AFR and
> aggregate with DHT
> That would reduce the client config file from 264 imported volumes to
> 66, reducing complexity of the configuration and the number of open
> connections
>
>   
>> *  Each node first aggregates three disks (forget about the 4th for
>> simplicity) and exports a replicated directory.  The client side then
>> aggregates the 66 single replicated directory into one.
>>     
>
> That might mean that access to some of the data is lost if one node
> fails - not what I'd accept from a replicated setup.
>
>   
>> * When grouping the aggregated directories on the client side, we can use
>> some kind of hierarchy.  For example the 66 directories are first aggregated
>> into groups of N each with DHT, and then the 66/N groups are again
>> aggregated with DHT.
>>     
>
> Doesn't that just make the setup more complicated?
>
>   
>> *  We don't do the grouping on the client side.  Rather, we use some
>> intermediate server to first aggregate small groups of directories with DHT
>> and export them as a single directory.
>>     
>
> The network connection of the intermediate server might become a
> bottleneck, limiting performance.
> The intermediate server might become a single point of failure.
>
>   
>> * We can also put AFR after DHT
>> ......
>>
>> To make things more complicated, the 66 machines are separated into two
>> racks with only 4-gigabit inter-rack connection, so all the directories
>> exported by the servers are not equal to a particular client.
>>     
>
> A workaround might be to create two intermediate volumes that each
> perform better when accessed from on one of the racks and use NUFA to
> create the single volume.
>
> Keeping replicated data local to one rack would improve performance,
> but the failure of one complete rack (e.g. power line failure,
> inter-rack networking) would block access to half of your data.
>
> Getting a third rack and much faster inter-rack connection would
> improve performance and protect better against failures - just place
> the three copies of a file on different racks. ;-)
>
>   
>> I'm wondering if someone on the mailing list could provide me with some
>> advice.
>>     
>
> Plan, build, test ... repeat until satisfied :-)
>
> Optional: share your solution, with benchmarks
>
>
> IMHO, there won't be a single "best" solution.
>
>
> Harald St?rzebecher
>