Re: question on AFR behavior when master is down
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Gerry Reno wrote:
Every so often it is necessary to bring machines down for some type of
maintenance. If the machine is part of a glusterfs AFR replication
setup what will happen in the following scenarios?:
master (brick1) is brought down, files are added, changed and deleted
on glusterfs, master is brought back up. Does the master(brick1)
resume it's master role? If so, does it sync and correctly
add/chg/del files to its brick that were modified while it was down?
slave (brick3) is brought down, files are added, changed and deleted
on glusterfs, slave is brought back up. Since the cluster had a slave
removed from the middle of the order the replication specified in the
config may change on other bricks I assume during the down time. Does
this all get straightened out when this slave returns to the cluster?
In other words I may replicate over the first three bricks for some
files therefore while brick3 is down brick4 would actually become the
third brick in the cluster then. Would it be receiving the
replication intended for brick3? Again, when brick3 restarts does
this all get straightened out. In other words, does brick4 get
cleaned up and unintended files removed that were replicated to it
when brick3 was down?
Following up on this topic, what I think I would like to see with regard
to AFR is that AFR would be able to have two 'master' bricks that would
always contain the same files. As long as glusterfs would have one of
these masters available it is happy and the cluster would operate
normally. If one master gets taken down when it comes back up glusterfs
just syncs it with the other master. This would provide high
availability without changing the behavior of the cluster. From the
standpoint of slaves, once a slave is declared it should be considered
in any operations whether it is up or down. In other words if slave
brick 3 is taken down, do not push files to brick4 just because it is
the third 'reachable' brick. Just skip brick 3 for now until it comes
back up and then resync it. If you really intend to take a brick down
permanently then you must modify the config files to reflect that fact.
my 2c,
Gerry
[Index of Archives]
[Gluster Users]
[Ceph Users]
[Linux ARM Kernel]
[Linux ARM]
[Linux Omap]
[Fedora ARM]
[IETF Annouce]
[Security]
[Bugtraq]
[Linux]
[Linux OMAP]
[Linux MIPS]
[eCos]
[Asterisk Internet PBX]
[Linux API]