On 11/21/2013 01:08 PM, James wrote: > On Wed, 2013-11-20 at 18:30 +0530, Lalatendu Mohanty wrote: >> On 11/12/2013 05:54 AM, James wrote: >>> Hi there, >>> >>> This is a hypothetical problem, not one that describes specific hardware >>> at the moment. >>> >>> As we all know, gluster currently usually works best when each brick is >>> the same size, and each host has the same number of bricks. Let's call >>> this a "homogeneous" configuration. >>> >>> Suppose you buy the hardware to build such a pool. Two years go by, and >>> you want to grow the pool. Changes in drive size, hardware, cpu, etc >>> will be such that it won't be possible (or sensible) to buy the same >>> exact hardware, sized drives, etc... A heterogeneous pool is >>> unavoidable. >>> >>> Is there a general case solution for this problem? Is something planned >>> to deal with this problem? I can only think of a few specific corner >>> case solutions. >> I am not sure about of issues you are expecting when a heterogeneous >> configuration is used. As gluster is intelligent enough for handling >> sub-volumes/bricks with different sizes. So I think heterogeneous >> configuration should not be a issue for gluster. Let us know what are >> the corner cases you have in mind (may be this will give me some >> pointers to think :)). > I am thinking about performance differences, due to an imbalance of data > stored on type A hosts, versus type B hosts. I am also thinking about > performance simply due to older versus newer hardware. Even at the > interconnect level there could be significant differences (eg: Gigabit > vs. 10gE, etc...) > > I'm not entirely sure how well Gluster can keep the data proportionally > balanced (eg: each brick has 60% or 70% free space, independent of > actual Gb stored) if there is a significant enough difference in the > size of the bricks. Any idea? > The dynamic hashing algorithm automatically works well to keep data fairly distributed. But it will not 100% of the cases as the hash value depends on the file name. However Gluster can create data on another brick if the one brick is full. User can decide (through a volume set command) at what % data usage, Gluster should consider it is full. There is a nice blog from Jeff about it. http://hekafs.org/index.php/2012/03/glusterfs-algorithms-distribution/ >>> Another problem that comes to mind is ensuring that the older slower >>> servers don't act as bottlenecks to the whole pool >> I think this is unavoidable but the time-line for these kind of change >> will be around 10 to 15 years. However we can replace bricks if the old >> servers really slows the whole thing down. > Well I think it's particularly elegant that Gluster works on commodity > hardware, but it would be ideal if it worked with heterogeneous hardware > in a much more robust way. The ideas jdarcy had mentioned seem like they > might solve these problems in a nice way, but afaik they're just ideas > and not code yet. Agree!, storage tiering is awesome idea. Like you mentioned it also solves the performance issue in a heterogeneous setup. >>> . jdarcy had mentioned >>> that gluster might gain some notion of tiering, to support things like >>> ssd's in one part of the volume, and slow drives at the other end. Maybe >>> this sort of architecture can be used to solve the same problems. >>> >>> Thoughts and discussion welcome. >>> >>> Cheers, >>> James >>> >>> >>> >>> _______________________________________________ >>> Gluster-users mailing list >>> Gluster-users at gluster.org >>> http://supercolony.gluster.org/mailman/listinfo/gluster-users >> _______________________________________________ >> Gluster-users mailing list >> Gluster-users at gluster.org >> http://supercolony.gluster.org/mailman/listinfo/gluster-users