Re: Add single server

Shyam <srangana@xxxxxxxxxx> · Mon, 1 May 2017 14:09:50 -0400

On 05/01/2017 01:52 PM, Gandalf Corvotempesta wrote:
2017-05-01 19:50 GMT+02:00 Shyam <srangana@xxxxxxxxxx>:
Splitting the bricks need not be a post factum decision, we can start with
larger brick counts, on a given node/disk count, and hence spread these
bricks to newer nodes/bricks as they are added.

If I understand the ceph PG count, it works on a similar notion, till the
cluster grows beyond the initial PG count (set for the pool) at which point
there is a lot more data movement (as the pg count has to be increased, and
hence existing PGs need to be further partitioned)

Exactly.
Last time i've used ceph, the PGs worked in a similiar way.

Expanding on this notion, the considered brick-splitting needs some 
other enhancements that can retain the replication/availability count, 
when moving existing bricks from one place to another. Thoughts on this 
are posted here [1].

In essence we are looking at "+1 scaling" (what that +1 is, a disk, a 
node,... is not set in stone yet, but converging at a disk is fine as an 
example). +1 scaling involves,
 a) ability to retain replication/availability levels
 b) optimal data movement
 c) optimal/acceptable time before which added capacity is available 
for use (by the consumer of the volume)
 d) is there a (d)? Would help in getting the requirement clear...

Brick splitting can help with (b) and (c), with strategies like [1] for 
(a), IMO.

Brick splitting also brings in complexities in DHT (like looking up 
everywhere, or the scale count of distribution that would increase). 
Such complexities have some solutions (like lookup optimize), and 
possibly needs some testing and bench marking to ensure it does not trip 
at this layer.

Also, brick multiplexing is already in the code base, which is to deal 
with large(r) number of bricks per node. Which would be the default with 
brick splitting and hence would help.

Further, the direction with JBR, needed a leader per node for a brick 
(so that clients are utilizing all server connections than just the 
leader) and was possibly the birth place for brick splitting thought.

Also, the ideas behind larger bucket counts for DHT2 than real bricks 
was to deal with (b).

Why I put this story together is to state 2 things,
- We realize that we need this, and have been working on strategies 
towards achieving the same
- We need the bits chained right, so that we can make this work and 
there is substantial work to be done here

Shyam

[1] Moving a brick in pure dist/replica/ec setup to another within or 
across nodes thoughts (my first comment on this issue, github does not 
have a comment index for me to point to the exact comment): 
https://github.com/gluster/glusterfs/issues/170
_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://lists.gluster.org/mailman/listinfo/gluster-users