Re: Monitoring and solving split-brain

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On 10/14/2015 11:08 PM, Игорь Бирюлин wrote:
Thanks for detailed description.
Do you have a plans add resolution GFID split-brain by 'gluster volume heal VOLNAME split-brain ...' ?
Not at the moment..
What the main different between GFID split-brain and data split brain? On nodes this file absolutely different by data content and size or it isn't 'data' in glusterfs meaning?

GFID is unique to a file (something akin to an inode number) and is assigned when a file is created. Data split-brain occurs when the file with same gfid already exists on both bricks, but there's a difference in the file's content. (eg. one write succeeded only on brick1 and another write only on brick2). 
gfid split-brain occurs when a file creation happens twice (say an application does an open() with O_CREAT) but succeeds only on one brick each time.
Best regards,
Igor



2015-10-14 20:13 GMT+03:00 Ravishankar N <ravishankar@xxxxxxxxxx>:


On 10/14/2015 10:05 PM, Игорь Бирюлин wrote:
Thanks for your replay.

If I do listing in mount point (/repo):
# ls /repo/xxx/keyrings/debian-keyring.gpg
ls: cannot access /repo/xxx/keyrings/debian-keyring.gpg: Input/output error
#
In log /var/log/glusterfs/repo.log I see:
[2015-10-14 16:27:36.006815] W [MSGID: 108008] [afr-self-heal-name.c:359:afr_selfheal_name_gfid_mismatch_check] 0-repofiles-replicate-0: GFID mismatch for <gfid:4a99bf9d-7423-47d9-a09d-fabaa333eccf>/debian-keyring.gpg 69aaeee6-624b-400a-aa46-b5c6166c014c on repofiles-client-1 and b95ad06e-786a-44e5-ba71-af661982071f on repofiles-client-0

So the file has ended up in GFID split-brain (The trusted.gfid value is different in both bricks as seen in your output below.), which cannot be handled by the split-brain resolution commands. These commands can only resolve data and metadata split-brain. I'm afraid you'll manually need to delete one of the file and the .glusterfs hardlink from the brick. Not sure why the parent-directory was not listed in 'gluster v heal VOLNAME info split-brain' output.

[2015-10-14 16:27:36.008996] W [fuse-bridge.c:451:fuse_entry_cbk] 0-glusterfs-fuse: 65961: LOOKUP() /xxx/keyrings/debian-keyring.gpg => -1 (Input/output error)

On first node getfattr return:
# getfattr -d -m . -e hex /storage/gluster_brick_repofiles/xxx/keyrings/debian-keyring.gpg
getfattr: Removing leading '/' from absolute path names
# file: storage/gluster_brick_repofiles/xxx/keyrings/debian-keyring.gpg
trusted.afr.dirty=0x000000000000000000000000
trusted.afr.repofiles-client-1=0x000000020000000100000000
trusted.bit-rot.version=0x020000000000000055fdf0910003b37b
trusted.gfid=0xb95ad06e786a44e5ba71af661982071f
# ls -l /storage/gluster_brick_repofiles/xxx/keyrings/debian-keyring.gpg
-rw-r--r-- 2 root root 3456271 Oct 13 19:00 /storage/gluster_brick_repofiles/xxx/keyrings/debian-keyring.gpg
#

On second node getfattr return:
# getfattr -d -m . -e hex /storage/gluster_brick_repofiles/xxx/keyrings/debian-keyring.gpg
getfattr: Removing leading '/' from absolute path names
# file: storage/gluster_brick_repofiles/xxx/keyrings/debian-keyring.gpg
trusted.afr.dirty=0x000000000000000000000000
trusted.afr.repofiles-client-0=0x000000000000000000000000
trusted.bit-rot.version=0x020000000000000055f97b57000dc3c6
trusted.gfid=0x69aaeee6624b400aaa46b5c6166c014c
# ls -l /storage/gluster_brick_repofiles/xxx/keyrings/debian-keyring.gpg
-rw-r--r-- 2 root root 3450346 Oct  9 16:22 /storage/gluster_brick_repofiles/xxx/keyrings/debian-keyring.gpg
#

Best regards,
Igor






2015-10-14 19:14 GMT+03:00 Ravishankar N <ravishankar@xxxxxxxxxx>:


On 10/14/2015 07:02 PM, Игорь Бирюлин wrote:
Hello,
today in my 2 nodes replica set I've found split-brain. Command 'ls' start told 'Input/output error'.

What does the mount log (/var/log/glusterfs/<path-to-mount>.log) say when you get this  error?

Can you run getfattr as root for the file from *both* bricks and share the result?
`getfattr -d -m . -e hex /storage/gluster_brick_repofiles/xxx/keyrings/debian-keyring.gpg`

Thanks.
Ravi



But command 'gluster v heal VOLNAME info split-brain' does not show problem files:
# gluster v heal repofiles info split-brain
Brick dist-int-master03.xxx:/storage/gluster_brick_repofiles
Number of entries in split-brain: 0

Brick dist-int-master04.xxx:/storage/gluster_brick_repofiles
Number of entries in split-brain: 0
#
In output of 'gluster v heal VOLNAME info' I see problem files (/xxx/keyrings/debian-keyring.gpg, /repos.json), but without split-brain markers:
# gluster v heal repofiles info
Brick dist-int-master03.xxx:/storage/gluster_brick_repofiles
/xxx/keyrings/debian-keyring.gpg
<gfid:09ec49c9-911a-4b83-abe8-080fe79e7c69>
<gfid:35c51b11-a7fb-496d-9e88-6d5a54fda7da>
/repos.json
<gfid:4f5cb2b5-30e2-43b0-a935-cfc42af883bf>
<gfid:9d2fc354-37c0-47a7-b9f3-379504cba797>
<gfid:cd86a246-9fc4-47d2-bb4d-67566677f77a>
<gfid:b932eed0-07e9-45c5-943e-7478e9f654b4>
<gfid:28bf2ffe-948c-4c7d-bce6-966242338581>
<gfid:ee5659ae-1335-42c5-a852-790387b4213b>
<gfid:fdfb6b8c-3c04-435a-b8d3-8d8341b66409>
Number of entries: 11

Brick dist-int-master04.xxx:/storage/gluster_brick_repofiles
Number of entries: 0
#

I couldn't solve split-brain by new standard command:
# gluster v heal repofiles  split-brain bigger-file /repos.json
Lookup failed on /repos.json:Input/output error
Volume heal failed.
#

Additional info:
# gluster v info
 Volume Name: repofiles
 Type: Replicate
 Volume ID: 4b0e2a74-f1ca-4fe7-8518-23919e1b5fa0
 Status: Started
 Number of Bricks: 1 x 2 = 2
 Transport-type: tcp
 Bricks:
 Brick1: dist-int-master03.xxx:/storage/gluster_brick_repofiles
 Brick2: dist-int-master04.xxx:/storage/gluster_brick_repofiles
 Options Reconfigured:
 performance.readdir-ahead: on
 client.event-threads: 4
 server.event-threads: 4
 cluster.lookup-optimize: on
# cat /etc/issue
Ubuntu 14.04.3 LTS \n \l
# dpkg -l | grep glusterfs
ii  glusterfs-client                        3.7.5-ubuntu1~trusty1                amd64        clustered file-system (client package)
ii  glusterfs-common                        3.7.5-ubuntu1~trusty1                amd64        GlusterFS common libraries and translator modules
ii  glusterfs-server                        3.7.5-ubuntu1~trusty1                amd64        clustered file-system (server package)
#

I have 2 questions:
1. Why 'gluster v heal VOLNAME info split-brain' doesn't show actual split-brain? Why in 'gluster v heal VOLNAME info' I doesn't see markers like 'possible in split-brain'?
How I can monitor my gluster installation if these commands doesn't show problems?
2. Why 'gluster volume heal VOLNAME split-brain bigger-file FILE' doesn't solve split-brain? I understand that I can solve split-brain remove files from brick but I thought to use this killer feature.

Best regards,
Igor


_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users





_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users

[Index of Archives]     [Gluster Development]     [Linux Filesytems Development]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux