Re: HA replica

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 02/12/2016 12:08 PM, Mike Stump wrote:
Ok. I’m a new user, I want to make an array with 10 machines. I want to be able to able to suffer the loss of any one machine. I don’t mind wasting 50% of the disk space to do this. I don’t want to suffer split brain. I want the array to support both read and write access to data. How do I achieve that?

What is your acceptable annual downtime (typically outlined in an SLA or OLA)? That's a bit of information you should have when you're engineering a system.

Split-brain happens when your replication has been partitioned and writes have occurred in such a way that no valid copy can be discerned. For the sake of example, we're going to use a very simple file entitled "file.txt" with the contents of "The quick brown fox jumped over the lazy yellow dog." It exists on a replicated volume with no protection on a network where a server and client are in the west wing, and the replica server and another client are in the east wing. Somewhere in the middle, someone pulls the plug on the router. The west client can see the west server and the east client can see the east server.

The west client updates file.txt changing the word "brown" to "red". The east client updates the same file.txt and changes the word "brown" to "white".

The router recovers and the two servers try to synchronize any files that were changed. They both had changes to file.txt. Which one was right?

There's no way to determine that from the information given. That's split-brain.

How can you combat split brain?

One solution is quorum. Have enough replica that comparisons can be made. If two servers are in the west and only one in the east and they have the ability to determine quorum, the east server will not allow writes during the network split. It can tell that it's not safe because if they all three voted on which change was right, the two in the west would win and data would be lost. The two in the west see that one server is lost, but they still have quorum. They allow the data to remain available, knowing that the out-of-quorum server is safe from changes.

Gluster has the ability to have a minimally participating quorum participant called an arbiter. Let's make the west client an arbiter. The net split happens. Only the two replica exist, one in west and the other in east. The arbiter can see the west server but not the east. The east server can see neither the west server nor the arbiter. The east loses quorum but the west, seeing the arbiter, does still have quorum and remains available with the safe understanding that the east server, not having quorum, will not accept writes.

So with your 10 servers you could have a "replica 3 arbiter 1" volume with one of the replica being an arbiter. It would only use space for file names and metadata, but no actual data. If I were doing it, I would probably do it as so:

    gluster volume create myvol replica 3 arbiter 1 server1:/brick1 server2:/brick1 server3:/arbiter \
    server3:/brick1 server4:/brick1 server5:/arbiter etc.

Notice how there's both a data directory (/brick1) and an arbiter directory (/arbiter) on bricks 3,5,7... which allows the data "waste" that you're asking for while mostly allowing the availability you seek. I say mostly because if your network partitions, something's got to give or you will lose data. There's absolutely no way for disconnected systems to coordinate binary changes to each other with today's technology.

Perhaps, one day, we will have quantum tunneling networks with superimposed particles able to teleport data without the need of networks, but that's not today. When that is available, I expect rainbows and unicorns to be available as well.
_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users

[Index of Archives]     [Gluster Development]     [Linux Filesytems Development]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux