Re: ... i was able to produce a split brain...

Pranith Kumar Karampuri <pkarampu@xxxxxxxxxx> · Wed, 04 Feb 2015 12:28:17 +0530

On 02/03/2015 10:42 PM, Ted Miller wrote:

On 1/31/2015 12:47 AM, Pranith Kumar Karampuri wrote:

On 01/30/2015 06:28 PM, Jeff Darcy wrote:
Pranith and I had a discussion regarding this issue and here is 
what we have
in our mind right now.

We plan to provide the user commands to execute from mount so that 
he can
access the files in split-brain. This way he can choose which copy 
is to be
used as source. The user will have to perform a set of getfattrs and
setfattrs (on virtual xattrs) to decide which child to choose as 
source and
inform AFR with his decision.

A) To know the split-brain status :
getfattr -n trusted.afr.split-brain-status <path-to-file>

This will provide user with the following details -
1) Whether the file is in metadata split-brain
2) Whether the file is in data split-brain

It will also list the name of afr-children to choose from. 
Something like :
Option0: client-0
Option1: client-1

We also tell the user what the user could do to view metadata/data 
info; like
stat to get metadata etc.

B) Now the user has to choose one of the options 
(client-x/client-y..) to
inspect the files.
e.g., setfattr -n trusted.afr.split-brain-choice -v client-0 
<path-to-file>
We save the read-child info in inode-ctx in order to provide the 
user access
to the file in split-brain from that child. Once the user inspects 
the file,
he proceeds to do the same from the other child of replica pair and 
makes an
informed decision.

C) Once the above steps are done, AFR is to be informed with the 
final choice
for source. This is achieved by -
(say the fresh copy is in client-0)
e.g., setfattr -n trusted.afr.split-brain-heal-finalize -v client-0
<path-to-file>
This child will be chosen as source and split-brain resolution will 
be done.
May I suggest another possible way to get around the difficulty in 
determining which of the files is the one to keep?

What if each of the files were to be renamed by appending the name of 
the brick-host that it lives on?
For example, in a replica 2 system:
brick-1: data1
host-1: host1
brick-2: date1
host-2: host2
file name: hair-pulling.txt

after running script/command to resolve split-brain, file system would 
have two files:
hair-pulling.txt__host-1_data1
hair-pulling.txt__host-2_data1
This doesn't seem so bad either. I will need to give it more thought to 
see if there are any problems.

the user would then delete the unwanted file and rename the wanted 
file back to hair-pulling.txt.

The only problem would come with a very large file with a large number 
of replicas (like the replica 5 system I am working with). You might 
run out of space for all the copies.

Otherwise, this seems (to me) to present a user-friendly way to do 
this.  If the user has doubts (and disk space), user can choose to 
keep the rejected file around for a while, "just in case" it happens 
to have something useful in it that is missing from the "accepted" file.

****************************************************************
That brought another thought to mind (have not had reason to try it):
How does gluster cope if you go behind its back and rename a 
"rejected" file?  For instance, in my example above, what if I go 
directly on the brick and rename the host-2 copy of the file to 
hair-pulling.txt-dud?  The ideal scenario would seem to be that if 
user does a heal it would treat the copy as new file, see no dupe for 
hair-pulling.txt, and create a new dupe on host-2.  Since 
hair-pulling.txt-dud is also a new file, a dupe would be created on 
host-1.  User could then access files, verify correctness, and then 
delete hair-pulling.txt-dud.

*****************************************************************
This one won't work because of the reason Joe gave about gfid-hardlinks.

A not-officially-sanctioned way that I dealt with a split-brain a few 
versions back:
1. decided I wanted to keep file on host-2
2. log onto host-2
3. cp /brick/data1/hair-pulling.txt /gluster/data1/hair-pulling.txt-dud
4. rm /brick/data1/hair-pulling.txt
5. follow some Joe Julian blog stuff to delete the "invisible fork" of 
file
6. gluster volume heal data1 all
I believe that this did work for me at that time.  I have not had to 
do it on a recent gluster version.
This would work. You can check the document written by Ravi for this in 
the official tree: 
https://github.com/gluster/glusterfs/blob/master/doc/debugging/split-brain.md

Pranith

Ted Miller
Elkhart, IN

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users