Re: Avoiding Split Brains

"Iain Milne" <glusterfs@xxxxxxxxxxx> · Fri, 30 Oct 2015 14:39:17 -0000

But in the vast majority of cases I'm not seeing specific paths to
split-brained files. All I get is a big list of GFIDs with one or two
human-readable paths dotted in there (that weren't there when I first
posted a week ago). How do I go from a GFID to a file I can identify?

gluster volume heal <vol-name> info

    Brick server1:/brick
    <gfid:85893940-63a8-4fa3-bf83-9e894fe852c7>
    <gfid:8b325ef9-a8d2-4088-a8ae-c73f4b9390fc>
    <gfid:ed815f9b-9a97-4c21-86a1-da203b023cda>
    /some/path/to/a/known/file               <- that only seems to exist
on one server
    <gfid:7fdbd6da-b09d-4eaf-a99b-2fbe889d2c5f>
    ...
    Number of entries: 217

    Brick server2:/brick
    Number of entries: 0

and

gluster volume heal <vol-name> info split-brain

    Brick server1:/brick
    Number of entries in split-brain: 0

    Brick server2:/brick
    Number of entries in split-brain: 0

??

> -----Original Message-----
> From: Diego Remolina [mailto:dijuremo@xxxxxxxxx]
> Sent: 30 October 2015 14:29
> To: Iain Milne
> Cc: gluster-users@xxxxxxxxxxx List
> Subject: Re:  Avoiding Split Brains
>
> Read carefully the blog from JoeJulian, it tells you how to identify and
clear the
> files in split brain. Make sure you have good backups prior to erasing
anything.
>
> https://joejulian.name/blog/fixing-split-brain-with-glusterfs-33/
>
> He even provides a script.
>
> Diego
>
> On Fri, Oct 30, 2015 at 10:08 AM, Iain Milne <glusterfs@xxxxxxxxxxx> wrote:
> > Ok, thanks - that certainly helps (a lot!), but what about all these
> > gfid files? Are they files in split-brain or something else? The links
> > don't cover dealing with anything like this :-(
> >
> > My impression is that maybe they're files have haven't replicated
> > and/or haven't been self healed, for whatever reason...
> >
> >> -----Original Message-----
> >> From: Diego Remolina [mailto:dijuremo@xxxxxxxxx]
> >> Sent: 30 October 2015 12:58
> >> To: Iain Milne
> >> Cc: gluster-users@xxxxxxxxxxx List
> >> Subject: Re:  Avoiding Split Brains
> >>
> >> Yes, you need to avoid split brain on a two node replica=2 setup. You
> > can just
> >> add a third node with no bricks which serves as the arbiter and set
> > quorum to
> >> 51%.
> >>
> >> If you set quorum to 51% and do not have more than 2 nodes, then when
> >> one goes down all your gluster mounts become unavailable (or is it
> >> just read
> > only?).
> >> If you run VMs on top of this then you usually end up with
> >> paused/frozen
> > vms
> >> until the volume becomes available again.
> >>
> >> These are RH specific docs, but may help:
> >>
> >> https://access.redhat.com/documentation/en-
> >> US/Red_Hat_Storage/2.0/html/Administration_Guide/sect-User_Guide-
> >> Managing_Volumes-Quorum.html
> >>
> >> https://access.redhat.com/documentation/en-
> >> US/Red_Hat_Storage/3/html/Administration_Guide/sect-Managing_Split-
> >> brain.html
> >>
> >> First time in testing I hit split brain, I found these blog very useful:
> >>
> >> https://joejulian.name/blog/fixing-split-brain-with-glusterfs-33/
> >>
> >> HTH,
> >>
> >> Diego
> >>
> >> On Fri, Oct 30, 2015 at 8:46 AM, Iain Milne <glusterfs@xxxxxxxxxxx>
wrote:
> >> > Anyone?
> >> >
> >> >> -----Original Message-----
> >> >> From: gluster-users-bounces@xxxxxxxxxxx [mailto:gluster-users-
> >> >> bounces@xxxxxxxxxxx] On Behalf Of Iain Milne
> >> >> Sent: 21 October 2015 09:23
> >> >> To: gluster-users@xxxxxxxxxxx
> >> >> Subject:  Avoiding Split Brains
> >> >>
> >> >> Hi all,
> >> >>
> >> >> We've been running a distributed setup for 3 years with no issues.
> >> >> Recently we switched to a 2-server, replicated setup (soon to be a
> >> >> 4
> >> >> servers) and keep encountering what I assume are split-brain
> >> >> situations,
> >> >> eg:
> >> >>
> >> >>     Brick server1:/brick
> >> >>     <gfid:85893940-63a8-4fa3-bf83-9e894fe852c7>
> >> >>     <gfid:8b325ef9-a8d2-4088-a8ae-c73f4b9390fc>
> >> >>     <gfid:ed815f9b-9a97-4c21-86a1-da203b023cda>
> >> >>     <gfid:7fdbd6da-b09d-4eaf-a99b-2fbe889d2c5f>
> >> >>     ...
> >> >>     Number of entries: 217
> >> >>
> >> >>     Brick server2:/brick
> >> >>     Number of entries: 0
> >> >>
> >> >> a) What does this mean?
> >> >> b) How do I go about fixing it?
> >> >>
> >> >> And perhaps more importantly, how to I avoid this happening in the
> > future?
> >> >> Not once since moving to replication has either of the two servers
> >> >> been
> >> > offline
> >> >> or unavailable (to my knowledge).
> >> >>
> >> >> Is some sort of server/client quorum needed (that I admit I don't
> >> >> fully understand)? While high-availability would be nice to have,
> >> >> it's not
> >> > essential -
> >> >> robustness of the data is.
> >> >>
> >> >> Thanks
> >> >>
> >> >> Iain

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users