Re: Interesting split-brain...

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi, I did a maintenance on the 2 bricks that we have. I added RAM. One of the brick was down for about 30 minutes and the other one for about 10 minutes. In between the shutdown, I only gave a few minutes to gluster to heal. I know that many files were still not in synch when I have shutdown the second brick.

The rest is some assumption. I know that one of the user was trying to share the zsh history file between multiple dockers. He tried to use the same file and also tried to use a directory to have multiple history files.

My guess is that when I shutdown the first node, he created the directory. When I rebooted the first brick and shutdown the second one, I most likely have not give enough time to heal the 2 bricks. Then, he created the file on the second node. When I rebooted the second brick, Gluster was not able to recover.

Would a third brick have solved this situation? I am not entirely sure.

On Thu, Jun 15, 2017 at 1:43 AM, Mohammed Rafi K C <rkavunga@xxxxxxxxxx> wrote:

Can you please explain How we ended up in this scenario. I think that will help to understand more about this scenarios and why gluster recommend replica 3 or arbiter volume.

Regards

Rafi KC


On 06/15/2017 10:46 AM, Karthik Subrahmanya wrote:
Hi Ludwig,

There is no way to resolve gfid split-brains with type mismatch. You have to do it manually by following the steps in [1].
In case of type mismatch it is recommended to resolve it manually. But for only gfid mismatch in 3.11 we have a way to
resolve it by using the *favorite-child-policy*.
Since the file is not important, you can go with deleting that.
HTH,
Karthik

On Thu, Jun 15, 2017 at 8:23 AM, Ludwig Gamache <ludwig@xxxxxxxxxxxxx> wrote:
I am new to gluster but already like it. I did a maintenance last week where I shutdown both nodes (one after each others). I had many files that needed to be healed after that. Everything worked well, except for 1 file. It is in split-brain, with 2 different GFID. I read the documentation but it only covers the cases where the GFID is the same on both bricks. BTW, I am running Gluster 3.10.

Here are some details...

[root@NAS-01 .glusterfs]# gluster volume heal data01 info

Brick 192.168.186.11:/mnt/DATA/data

/abc/.zsh_history 

/abc - Is in split-brain


Status: Connected

Number of entries: 2


Brick 192.168.186.12:/mnt/DATA/data

/abc - Is in split-brain


/abc/.zsh_history 

Status: Connected

Number of entries: 2


On brick 1:

[root@NAS-01 abc]# ls -lart

total 75

drwxr-xr-x.  2 root  root  2 Jun  8 13:26 .zsh_history

drwxr-xr-x.  3 12078 root  3 Jun 12 11:36 .

drwxrwxrwt. 17 root  root 17 Jun 12 12:20 ..


On brick 2:

[root@DC-MTL-NAS-02 abc]# ls -lart

total 66

-rw-rw-r--.  2 12078 12078 1085 Jun 12 04:42 .zsh_history

drwxr-xr-x.  2 12078 root     3 Jun 12 10:36 .

drwxrwxrwt. 17 root  root    17 Jun 12 11:20 ..


Notice that on one brick, it is a file and on the other one it is a directory.

On brick 1:

[root@NAS-01 abc]# getfattr -d -m . -e hex /mnt/DATA/data/abc/.zsh_history

getfattr: Removing leading '/' from absolute path names

# file: mnt/DATA/data/abc/.zsh_history

security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000

trusted.afr.data01-client-0=0x000000000000000000000000

trusted.afr.data01-client-1=0x000000000000000200000000

trusted.gfid=0xdee43407139d41f091d13e106a51f262

trusted.glusterfs.dht=0x000000010000000000000000ffffffff


On brick 2:

root@NAS-02 abc]# getfattr -d -m . -e hex /mnt/DATA/data/abc/.zsh_histor

getfattr: Removing leading '/' from absolute path names

# file: mnt/DATA/data/abc/.zsh_history

security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000

trusted.afr.data01-client-0=0x000000170000000200000000

trusted.afr.data01-client-1=0x000000000000000000000000

trusted.bit-rot.version=0x060000000000000059397acd0005dadd

trusted.gfid=0xa70ae9af887a4a37875f5c7c81ebc803


Any recommendation on how to recover from that? BTW, the file is not important and I could easily get rid of it without impact. So, if this is an easy solution...

Regards,

--
Ludwig Gamache


_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://lists.gluster.org/mailman/listinfo/gluster-users



_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://lists.gluster.org/mailman/listinfo/gluster-users




--
Ludwig Gamache
IT Director - Element AI
4200 St-Laurent, suite 1200
514-704-0564
_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://lists.gluster.org/mailman/listinfo/gluster-users

[Index of Archives]     [Gluster Development]     [Linux Filesytems Development]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux