Expanding legacy gluster volumes

purpleidea at gmail.com (James) · Fri, 22 Nov 2013 08:16:22 -0500

On Thu, Nov 21, 2013 at 2:17 PM, Lalatendu Mohanty <lmohanty at redhat.com> wrote:
> On 11/21/2013 01:08 PM, James wrote:
>>
>> On Wed, 2013-11-20 at 18:30 +0530, Lalatendu Mohanty wrote:
>>>
>>> On 11/12/2013 05:54 AM, James wrote:
>>>>
>>>> Hi there,
>>>>
>>>> This is a hypothetical problem, not one that describes specific hardware
>>>> at the moment.
>>>>
>>>> As we all know, gluster currently usually works best when each brick is
>>>> the same size, and each host has the same number of bricks. Let's call
>>>> this a "homogeneous" configuration.
>>>>
>>>> Suppose you buy the hardware to build such a pool. Two years go by, and
>>>> you want to grow the pool. Changes in drive size, hardware, cpu, etc
>>>> will be such that it won't be possible (or sensible) to buy the same
>>>> exact hardware, sized drives, etc... A heterogeneous pool is
>>>> unavoidable.
>>>>
>>>> Is there a general case solution for this problem? Is something planned
>>>> to deal with this problem? I can only think of a few specific corner
>>>> case solutions.
>>>
>>> I am not sure about of issues you are expecting when a heterogeneous
>>> configuration is used. As gluster is intelligent enough for handling
>>> sub-volumes/bricks with different sizes.  So I think heterogeneous
>>> configuration should not be a issue for gluster. Let us know what are
>>> the corner cases you have in mind (may be this will give me some
>>> pointers to think :)).
>>
>> I am thinking about performance differences, due to an imbalance of data
>> stored on type A hosts, versus type B hosts. I am also thinking about
>> performance simply due to older versus newer hardware. Even at the
>> interconnect level there could be significant differences (eg: Gigabit
>> vs. 10gE, etc...)
>>
>> I'm not entirely sure how well Gluster can keep the data proportionally
>> balanced (eg: each brick has 60% or 70% free space, independent of
>> actual Gb stored) if there is a significant enough difference in the
>> size of the bricks. Any idea?
>>
>
> The dynamic hashing algorithm automatically works well to keep data fairly
> distributed. But it will not 100% of the cases as the hash value depends on
> the file name.  However Gluster can create data on another brick if the one
> brick is full. User can decide (through a volume set command) at what % data
> usage, Gluster should consider it is full. There is a nice blog from Jeff
> about it.
>
> http://hekafs.org/index.php/2012/03/glusterfs-algorithms-distribution/

Indeed I have read this. The problem is that when a brick is full the
file will get "linked" to a different location. This isn't a problem
for functionality, but it is a problem for performance, because I
believe this causes a slight overhead for finding the file the next
time. This is why clusters with similarly sized bricks are preferred.
If I'm wrong about this, let me know :)

>
>>>> Another problem that comes to mind is ensuring that the older slower
>>>> servers don't act as bottlenecks to the whole pool
>>>
>>> I think this is unavoidable but the time-line for these kind of change
>>> will be around 10 to 15 years. However we can replace bricks if the old
>>> servers really slows the whole thing down.
>>
>> Well I think it's particularly elegant that Gluster works on commodity
>> hardware, but it would be ideal if it worked with heterogeneous hardware
>> in a much more robust way. The ideas jdarcy had mentioned seem like they
>> might solve these problems in a nice way, but afaik they're just ideas
>> and not code yet.
>
>
> Agree!,  storage tiering is awesome idea. Like you mentioned it also solves
> the performance issue in a heterogeneous setup.

Can't wait to see it implemented :)

Cheers