On 05/01/2017 01:52 PM, Gandalf Corvotempesta wrote:
2017-05-01 19:50 GMT+02:00 Shyam <srangana@xxxxxxxxxx>:
Splitting the bricks need not be a post factum decision, we can start with
larger brick counts, on a given node/disk count, and hence spread these
bricks to newer nodes/bricks as they are added.
If I understand the ceph PG count, it works on a similar notion, till the
cluster grows beyond the initial PG count (set for the pool) at which point
there is a lot more data movement (as the pg count has to be increased, and
hence existing PGs need to be further partitioned)
Exactly.
Last time i've used ceph, the PGs worked in a similiar way.
Expanding on this notion, the considered brick-splitting needs some
other enhancements that can retain the replication/availability count,
when moving existing bricks from one place to another. Thoughts on this
are posted here [1].
In essence we are looking at "+1 scaling" (what that +1 is, a disk, a
node,... is not set in stone yet, but converging at a disk is fine as an
example). +1 scaling involves,
a) ability to retain replication/availability levels
b) optimal data movement
c) optimal/acceptable time before which added capacity is available
for use (by the consumer of the volume)
d) is there a (d)? Would help in getting the requirement clear...
Brick splitting can help with (b) and (c), with strategies like [1] for
(a), IMO.
Brick splitting also brings in complexities in DHT (like looking up
everywhere, or the scale count of distribution that would increase).
Such complexities have some solutions (like lookup optimize), and
possibly needs some testing and bench marking to ensure it does not trip
at this layer.
Also, brick multiplexing is already in the code base, which is to deal
with large(r) number of bricks per node. Which would be the default with
brick splitting and hence would help.
Further, the direction with JBR, needed a leader per node for a brick
(so that clients are utilizing all server connections than just the
leader) and was possibly the birth place for brick splitting thought.
Also, the ideas behind larger bucket counts for DHT2 than real bricks
was to deal with (b).
Why I put this story together is to state 2 things,
- We realize that we need this, and have been working on strategies
towards achieving the same
- We need the bits chained right, so that we can make this work and
there is substantial work to be done here
Shyam
[1] Moving a brick in pure dist/replica/ec setup to another within or
across nodes thoughts (my first comment on this issue, github does not
have a comment index for me to point to the exact comment):
https://github.com/gluster/glusterfs/issues/170
_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://lists.gluster.org/mailman/listinfo/gluster-users