Hey, To give some more context around the initial incident.. These systems are hosted in AWS. The gluster brick for each instance is a seperate volume to the root volume. On prod-gluster01 a couple of nights ago we
experienced massively high read iops on the root volume that we are unable to account for (> 200,000 iops when it usually sits between 0 - 100 iops ). The box became inaccessible as a result and after approximately 40 minutes with no sign of the iops reducing
was rebooted through the AWS console. The GFID mismatch problems appeared after that. There were initially ~50 impacted files, but I've fixed all but 1 of them now, which I'm leaving broken intentionally for further testing if required. If you don't mind, could you have a look over the information below and identify anything that looks like a problem, since obviously we did have a bunch of GFID mismatched files, which based on your email shouldn't
happen.. I've included everything I can think of, but if there is something else you would like to see, please let me know. # gluster volume info gv0 Volume Name: gv0 Type: Replicate Volume ID: 0ec7c49d-811c-4d4d-a3a9-e4ea9e83000c Status: Started Snapshot Count: 0 Number of Bricks: 1 x (2 + 1) = 3 Transport-type: tcp Bricks: Brick1: prod-gluster01.fqdn.com:/export/glus_brick0/brick Brick2: prod-gluster02.fqdn.com:/export/glus_brick0/brick Brick3: prod-gluster03.fqdn.com:/export/glus_brick0/brick (arbiter) Options Reconfigured: cluster.favorite-child-policy: none nfs.disable: on performance.readdir-ahead: on client.event-threads: 7 server.event-threads: 3 performance.cache-size: 256MB cluster.favorite-child-policy is set to none because I reverted the change to majority when it didn't make any difference. [root@prod-gluster01 glusterfs]# getfattr -d -m . -e hex /export/glus_brick0/brick/home/user/.viminfo getfattr: Removing leading '/' from absolute path names # file: export/glus_brick0/brick/home/user/.viminfo security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000 trusted.afr.dirty=0x000000000000000000000000 trusted.bit-rot.version=0x0200000000000000585756be00024333 trusted.gfid=0x1b86a5a76e884f40be583fa33aa9a576 [root@prod-gluster02 glusterfs]# getfattr -d -m . -e hex /export/glus_brick0/brick/home/user/.viminfo getfattr: Removing leading '/' from absolute path names # file: export/glus_brick0/brick/home/user/.viminfo security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000 trusted.afr.dirty=0x000000000000000000000000 trusted.afr.gv0-client-0=0x000000020000000100000000 trusted.bit-rot.version=0x020000000000000058593aac000661fa trusted.gfid=0x4931b10977f34496a7cdf8f23809c372 [root@prod-gluster03 glusterfs]# getfattr -d -m . -e hex /export/glus_brick0/brick/home/user/.viminfo getfattr: Removing leading '/' from absolute path names # file: export/glus_brick0/brick/home/user/.viminfo security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000 trusted.afr.dirty=0x000000000000000000000000 trusted.afr.gv0-client-0=0x000000020000000100000000 trusted.bit-rot.version=0x020000000000000058585ed6000f2077 trusted.gfid=0x4931b10977f34496a7cdf8f23809c372 Just in case it's useful, here is the getfattr for the parent directory: [root@prod-gluster01 glusterfs]# getfattr -d -m . -e hex /export/glus_brick0/brick/home/user getfattr: Removing leading '/' from absolute path names # file: export/glus_brick0/brick/home/user security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000 trusted.afr.dirty=0x000000000000000000000000 trusted.afr.gv0-client-1=0x000000000000000000000000 trusted.afr.gv0-client-2=0x000000000000000000000000 trusted.gfid=0x0a49de7ee4f04aae9fc8a88378e0d193 trusted.glusterfs.dht=0x000000010000000000000000ffffffff [root@prod-gluster02 glusterfs]# getfattr -d -m . -e hex /export/glus_brick0/brick/home/user getfattr: Removing leading '/' from absolute path names # file: export/glus_brick0/brick/home/user security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000 trusted.afr.dirty=0x000000000000000000000000 trusted.afr.gv0-client-0=0x000000000000000000000000 trusted.afr.gv0-client-2=0x000000000000000000000000 trusted.gfid=0x0a49de7ee4f04aae9fc8a88378e0d193 trusted.glusterfs.dht=0x000000010000000000000000ffffffff [root@prod-gluster03 glusterfs]# getfattr -d -m . -e hex /export/glus_brick0/brick/home/user getfattr: Removing leading '/' from absolute path names # file: export/glus_brick0/brick/home/user security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000 trusted.afr.dirty=0x000000000000000000000000 trusted.afr.gv0-client-0=0x000000000000000000000000 trusted.afr.gv0-client-1=0x000000000000000000000000 trusted.afr.gv0-client-2=0x000000000000000000000000 trusted.gfid=0x0a49de7ee4f04aae9fc8a88378e0d193 trusted.glusterfs.dht=0x000000010000000000000000ffffffff [root@prod-gluster01 bricks]# gluster volume heal gv0 info Brick prod-gluster01.fqdn.com:/export/glus_brick0/brick Status: Connected Number of entries: 0 Brick prod-gluster02.fqdn.com:/export/glus_brick0/brick <gfid:4931b109-77f3-4496-a7cd-f8f23809c372> Status: Connected Number of entries: 1 Brick prod-gluster03.fqdn.com:/export/glus_brick0/brick <gfid:4931b109-77f3-4496-a7cd-f8f23809c372> Status: Connected Number of entries: 1 [root@prod-gluster01 bricks]# gluster volume heal gv0 info split-brain Brick prod-gluster01.fqdn.com:/export/glus_brick0/brick Status: Connected Number of entries in split-brain: 0 Brick prod-gluster02.fqdn.com:/export/glus_brick0/brick Status: Connected Number of entries in split-brain: 0 Brick prod-gluster03.fqdn.com:/export/glus_brick0/brick Status: Connected Number of entries in split-brain: 0 Clients show this in the gluster.log: [2017-01-04 03:13:40.863695] W [MSGID: 108008] [afr-self-heal-name.c:354:afr_selfheal_name_gfid_mismatch_check] 0-gv0-replicate-0: GFID mismatch for <gfid:0a49de7e-e4f0-4aae-9fc8-a88378e0d193>/.viminfo 4931b109-77f3-4496-a7cd-f8f23809c372
on gv0-client-1 and 1b86a5a7-6e88-4f40-be58-3fa33aa9a576 on gv0-client-0 [2017-01-04 03:13:40.867853] W [fuse-bridge.c:471:fuse_entry_cbk] 0-glusterfs-fuse: 13067223: LOOKUP() /home/user/.viminfo => -1 (Input/output error) There's no mention of either of the GFID's for the .viminfo file in /var/log/gluster/*.log or /var/log/gluster/brick/export-glus_brick0-brick.log file. Thank you very much for your time, Michael Ward From: Ravishankar N [mailto:ravishankar@xxxxxxxxxx]
On 01/04/2017 06:27 AM, Michael Ward wrote:
For resolving gfid-split-brains, there is no automated way or favorite-child policy. When you say 2 data+1 arbiter, you are using an actual arbiter
volume right? (as opposed to a replica 2 volume + a dummy node which some people are referring to as arbiter for server-quourm). gfid-split-brains should not occur on either replica-3 or arbiter volumes with the steps you described.
|
_______________________________________________ Gluster-users mailing list Gluster-users@xxxxxxxxxxx http://www.gluster.org/mailman/listinfo/gluster-users