Re: Awareness of the disk space available on nodes

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



> Awesome, so now we can finally have proper "add nodes as you go"
> setups without having to rebalance etc. Sweet!

I hate to be Mr. Negativity, but as the author of that patch I think it
behooves me to point out that things aren't quite that good yet.  The
patch doesn't remove the need for rebalancing; it just makes rebalancing
do something more reasonable.  Let's say that we had two bricks A and B,
with 1TB each.  They would each receive 50% of new files.  Now we add
brick C, with 2TB.

* Until we rebalance, A and B each have 50% of the files.

* If we rebalance *without the patch*, A/B/C will each have 33% of the
files.  That's too much for A and B, too little for C.

* If we rebalance *with the patch*, A and B will each have 25% of the
files, while C has 50%.  In most cases (e.g. except where this might
cause excessive network/memory load on C) this is a better result.

The "no need to rebalance" feature, in which existing files are left
alone but new files are preferentially directed toward new bricks, is
not yet implemented.  The basic mechanism is to assign new layouts based
on each brick's *free* space instead of total space.  That's pretty easy
to do, but risks creating a new problem.  If we continue to allocate
more files to the newer bricks even after balance has been restored,
then we'll start to overload the new bricks.  To avoid this, we need to
do two things.

(1) Periodically re-evaluate the divergence between our current layout
and our ideal, setting a flag when the divergence is too great.

(2) In the create path, if the flag is set, fix the parent directory
layout before creating the new file.

(Note for future implementers: instead of a flag, we could use the same
"commit hash" technique as in http://review.gluster.org/#/c/7702/)

As should be clear by now, this will not be a trivial addition.  I think
it still can - and should - be done for 3.7, but it's not there yet.
_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx
http://supercolony.gluster.org/mailman/listinfo/gluster-devel




[Index of Archives]     [Gluster Users]     [Ceph Users]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux