RE: AFR Replication

gordan@xxxxxxxxxx · Fri, 18 Apr 2008 15:52:05 +0100 (BST)

On Fri, 18 Apr 2008, Christopher Hawkins wrote:

See:   http://www.gluster.org/docs/index.php/Understanding_AFR_Translator

At the bottom of the page are examples to initiate the sync. To clarify on
this point and some of your other questions in the splitbrain thread:

Automatic re-sync due to one server being down is not available yet, but is
coming in the next release with an HA translator. For now you can do a total
re-sync manually by the method listed above, or allow the cluster re-sync
itself over time because accessing a file for a read or write will cause
that file to be synced.

I'm aware of the "read 1 byte/line from a file to sync it" approach. The 
problem I am seeing, however, is that I cannot see files that were created 
on node2 before node1 was started. If I cannot see them, I cannot read 
them to sync them.

I am assuming that this is due to the fact that my underlying FS (Reiser4) 
is having issues with xattrs.

You don't have to AFR from the client side, but you can. You can also do it
on the server side, or even both. Part of the beauty of glusterfs is the
simple building blocks - you can set it up any number of ways. Personally I
don't think n-fold increases in client bandwidth for mirroring is all that
bad. How many "mirrors" do you really need??  :-)

Fair. How do I configure server-server mirroring?

The server AFR's to other servers, then unifies the AFR'd volumes, then
exports them. The clients mount the export from any given server using round
robin dns or something similar (probably will be deprecated once the HA
translator is available).

You mean, have servers AFR as clients, then re-export the AFR-ed volume 
again? GlusterFS on top of GlusterFS?

That way the client needs only N*1 bandwidth (but
the servers need N* (num of AFR's)). So if you only need to keep 2x copies
of your data, you never need more than 2x the bandwidth. And no matter what
cluster filesystem you use, I can't think of a way to get 2x the files
without 2x the writes.

Sure, I accept that. I was just asking if there was a way to make the 
additional writes server-side, because servers are few and clients are 
many, so n* the server bandwdth will generally be smaller than 
server*client bandwidth.

There is no fencing and no quorum - the cluster is essentially stateless,
which is really great because if you build it right then you can't really
have a situation where split brain is possible (ok, VERY, very unlikely).

I can see that it's less of an issue than block-level split-brain, because 
this would at most lead to the odd file getting corrupted, whereas 
block-level split-brain would destroy the entire FS very quickly.

All clients connect on the same port, so if you AFR on the client side, say,
then it's tough to imagine how one client would be able to write to a server
while another client would think it was down, and yet would still have
access to another server on the same network and could write to it. Of
course if you don't consider these issues at build time, it is possible to
set yourself up for disaster in certain situations. But that's the case with
anything cluster related... All in all I think it's a tremendous filesystem
tool.

I agree, but I don't think split-brain conditions are as few or as 
preventtable as you are implying. Whenever there is more than 1 server, 
split-brain is possible. Especially if, for example, you want 2 mirrored 
servers and each is the client of the mirrored cluster pair. If the 
connection between the servers fails, each server would continue being 
able to see it's own mirrored copy and continue working, thus causing a 
split-brain. File-systems like GFS implement quorum and fencing to prevent 
this situation.

So, although a split-brain is less terminal than with GFS (file corruption 
rather than file system corruption), it is still possible.

Gordan