On 11-05-2015 1:43, Ravishankar N wrote:
Besides, geo-replication allows to replicate a replica-1 volume in
order to achieve similar results as replica-2.
But since geo-rep uses rsync I guess that it's less optimal than
using "replica-n" where I guess blocks are marked as dirty to be
replicated. Does geo-rep do the same?
How does replica-n and geo-rep compare in a continuous replication
scenario?
How safe is it to use replica-n or geo-rep for VM images? Will the
replicated VM images be mostly consistent compared to a bare-metal
sudden power-off?
My guess is that replica-n is safer than geo-rep since it replicates
writes synchronically in real-time, while geo-rep seems to do an
initial scan using rsync, but I'm not sure how it continues
replicating after that initial sync.
Anyway, I would like to ask, discuss or propose the following idea:
- Have an option to tell gluster to write to only one brick
(split-brains would be impossible) which will then replicate to other
bricks.
- A local brick (if exists) should be selected as the "write
authority brick".
If that one brick goes down, data previously written cannot be served
until the 'sync' has been completed to the other brick. Also, new
writes would not be possible until the brick comes up no?
Hi Ravi,
If that one local brick goes down, it's because the host and its VM went
down too. In this case, there will be no more writes anyway until the VM
is rebooted on another host.
Note that the same applies to replica-3.
Once the VM is started on another host (using the replicated storage),
the new local brick would be used as the new "write authority brick".
I prefer to see this feature as a different use-case, but hopefully
compatible with gluster's replica-n or geo-rep implementations.
This would increase the global write performance which is currently
constrained to the slowest node because writes are currently
replicated synchronically to all other replicas (=> writes are not
scalable for replica volumes).
There is an optimization that is on the cards where we write to all
bricks of the replica synchronously but we return success to the
application as soon as we get success from any one of the bricks.
(Currently we wait for replies from all bricks before returning
success/failure to the upper translator).
That are good news for the long-term.
I guess in this case it would also be convenient to consider the local
brick as more "authoritative" than the other remote bricks in case of
differences.
Basically, the idea here is to have an option to avoid split-brains
selecting an authority brick and to avoid sync writes.
The same goal could be achieved by forcing gluster to resolve *all*
split-brains by choosing the authority brick as the winner (?).
Do we currently have an option for doing something like this?
There is an arbiter feature for replica 3 volumes
(https://github.com/gluster/glusterfs/blob/master/doc/features/afr-arbiter-volumes.md)
being released in glusterfs 3.7 which would prevent files from going
into split-brains, you could try that out. If the writes can cause a
split-brain, it is failed with an ENOTCONN to the application.
Sounds very good. We are testing it and will provide feedback.
Expected results: never see split-brains.
Do we have some option to force the split-brains resolution algorithm to
*always* use the data from a given (local) brick?
Thank you very much for your answer.
Best regards,
Christopher
_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-devel