Re: Avoiding Split Brains

Diego Remolina <dijuremo@xxxxxxxxx> · Fri, 30 Oct 2015 08:58:01 -0400

Yes, you need to avoid split brain on a two node replica=2 setup. You
can just add a third node with no bricks which serves as the arbiter
and set quorum to 51%.

If you set quorum to 51% and do not have more than 2 nodes, then when
one goes down all your gluster mounts become unavailable (or is it
just read only?). If you run VMs on top of this then you usually end
up with paused/frozen vms until the volume becomes available again.

These are RH specific docs, but may help:

https://access.redhat.com/documentation/en-US/Red_Hat_Storage/2.0/html/Administration_Guide/sect-User_Guide-Managing_Volumes-Quorum.html

https://access.redhat.com/documentation/en-US/Red_Hat_Storage/3/html/Administration_Guide/sect-Managing_Split-brain.html

First time in testing I hit split brain, I found these blog very useful:

https://joejulian.name/blog/fixing-split-brain-with-glusterfs-33/

HTH,

Diego

On Fri, Oct 30, 2015 at 8:46 AM, Iain Milne <glusterfs@xxxxxxxxxxx> wrote:
> Anyone?
>
>> -----Original Message-----
>> From: gluster-users-bounces@xxxxxxxxxxx [mailto:gluster-users-
>> bounces@xxxxxxxxxxx] On Behalf Of Iain Milne
>> Sent: 21 October 2015 09:23
>> To: gluster-users@xxxxxxxxxxx
>> Subject:  Avoiding Split Brains
>>
>> Hi all,
>>
>> We've been running a distributed setup for 3 years with no issues.
>> Recently we switched to a 2-server, replicated setup (soon to be a 4
>> servers) and keep encountering what I assume are split-brain situations,
>> eg:
>>
>>     Brick server1:/brick
>>     <gfid:85893940-63a8-4fa3-bf83-9e894fe852c7>
>>     <gfid:8b325ef9-a8d2-4088-a8ae-c73f4b9390fc>
>>     <gfid:ed815f9b-9a97-4c21-86a1-da203b023cda>
>>     <gfid:7fdbd6da-b09d-4eaf-a99b-2fbe889d2c5f>
>>     ...
>>     Number of entries: 217
>>
>>     Brick server2:/brick
>>     Number of entries: 0
>>
>> a) What does this mean?
>> b) How do I go about fixing it?
>>
>> And perhaps more importantly, how to I avoid this happening in the future?
>> Not once since moving to replication has either of the two servers been
> offline
>> or unavailable (to my knowledge).
>>
>> Is some sort of server/client quorum needed (that I admit I don't fully
>> understand)? While high-availability would be nice to have, it's not
> essential -
>> robustness of the data is.
>>
>> Thanks
>>
>> Iain
>>
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users@xxxxxxxxxxx
>> http://www.gluster.org/mailman/listinfo/gluster-users
>
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users@xxxxxxxxxxx
> http://www.gluster.org/mailman/listinfo/gluster-users
_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users