Re: Questions

Gerry Reno <greno@xxxxxxxxxxx> · Thu, 05 Apr 2007 21:36:11 -0400

Anand Babu Periasamy wrote:
Gerry Reno writes:
Hows do GlusterFS behave in the following scenarios:
=================================
In a multi-brick cluster using AFR a node goes down and then later
is brought back online
ACTUAL BEHAVIOR:

DESIRED BEHAVIOR:
GlusterFS sees the node restart and then begins syncing it's
bricks from transaction log, once it is synced it is put back into
the cluster.

=================================
This is what self-heal functionality in 1.4 is supposed to do. Each
translator will contribute its piece of context-aware healing
functionality to the over all recovery process.

self-heal will involve multiple techniques. Key of them are
* journaled-recovery: It will maintain a journal of operations that
needs to be performed on a failed brick. For example dir related
operations, all I/O operations for AFR ... (This is exactly you
described above).
* lazy-recovery: Certain errors will be extremely time consuming to
detect. Instead of looking out for them (when the brick is offline),
GlusterFS will resume normal operation immediately. If it finds any
fault at run-time, self-heal will heal on demand (say duplicate
files.., missing directory on a brick..). It is OK if a dir is missing
in one of the brick, when it can be fixed at the time of access.
You can also initiate a forceful recovery by just triggering
faults (say "find /mnt/glusterfs -type f -exec file {} \;" will
navigate the entire dir tree and access each file. This should be
sufficient to convert many lazy checks to instant ones). Then
glusterfs-fsck tool would be a matter of shell script.

=================================
Expand/Contract a GlusterFS cluster.
ACTUAL BEHAVIOR:

DESIRED BEHAVIOR:
GlusterFS allows cluster members to be dynamically
hot-added/hot-removed from a running cluster.

=================================
As of adding bricks requires restart of GlusterFS.
http://www.gluster.org/docs/index.php/GlusterFS_FAQ#How_do_I_add_a_new_node_to_an_already_running_cluster_of_GlusterFS 

Hot-add/remove functionality is part of our road map. We are
introducing server-notification framework in 1.4. With this feature,
implementing hot-add/remove is a cake-walk.

Do you think this feature is important for 1.4?. I want to have 1.4
released as soon as possible..

For us hot-add/remove is very desirable. Just like with a RAID array, we 
would like to be able to add/remove gluster servers at will from a 
running cluster for things like maintenance, hardware replacements, etc. 
This is very essential in a production environment so that our field 
workforce is not idle whenever such tasks need to occur. If it will 
cause a big delay then postpone it to later but if a small delay then it 
would be good to have it in 1.4.

Gerry