is glusterfs DHT really distributed?

mark at mark.mielke.cc (Mark Mielke) · Mon, 28 Sep 2009 11:02:27 -0400

On 09/28/2009 10:51 AM, Wei Dong wrote:
> Your reply makes all sense to me.  I remember that auto-heal happens 
> at file reading; doest that mean opening a file for read is also a 
> global operation?  Do you mean that there's no other way of copying 30 
> million files to our 66-node glusterfs cluster for parallel processing 
> other than waiting for half a month?  Can I somehow disable self-heal 
> and get a seedup?
>
> Things turn out to be too bad for me.

On this page:

http://www.gluster.com/community/documentation/index.php/Translators/cluster/distribute

It seems to suggest that 'lookup-unhashed' says that the default is 'on'.

Perhaps try turning it 'off'?

Cheers,
mark

>
>
> Mark Mielke wrote:
>> On 09/28/2009 10:35 AM, Wei Dong wrote:
>>> Hi All,
>>>
>>> I noticed a very weird phenomenon when I'm copying data (200KB image 
>>> files) to our glusterfs storage.  When I run only run client, it 
>>> copies roughly 20 files per second and as soon as I start a second 
>>> client on another machine, the copy rate of the first client 
>>> immediately degrade to 5 files per second.   When I stop the second 
>>> client, the first client will immediately speed up to the original 
>>> 20 files per second.  When I run 15 clients, the aggregate 
>>> throughput is about 8 files per second, much worse than running only 
>>> one client.  Neither CPU nor network is saturated.  My volume file 
>>> is attached.  The servers are running on a 66 node cluster and the 
>>> clients are a 15-node cluster.
>>>
>>> We have 33x2 servers and at most 15 separate machines, with each 
>>> server serving < 0.5 clients on average.  I cannot think of a reason 
>>> for a distributed system to behave like this.  There must be some 
>>> kind of central access point.
>>
>> Although there is probably room for the GlusterFS folk to optimize...
>>
>> You should consider directory write operations to involve the whole 
>> cluster. Creating a file is a directory write operation. Think of how 
>> it might have to do self-heal across the cluster, make sure the name 
>> is right and not already in use across the cluster, and such things.
>>
>> Once you get to reads and writes for a particular file, it should be 
>> distributed.
>>
>> Cheers,
>> mark
>>
>
>

-- 
Mark Mielke<mark at mielke.cc>