Hey,
To give some
more context around the initial incident.. These systems are
hosted in AWS. The gluster brick for each instance is a
seperate volume to the root volume. On prod-gluster01 a
couple of nights ago we experienced massively high read iops
on the root volume that we are unable to account for (>
200,000 iops when it usually sits between 0 - 100 iops ).
The box became inaccessible as a result and after
approximately 40 minutes with no sign of the iops reducing
was rebooted through the AWS console.
The GFID
mismatch problems appeared after that. There were initially
~50 impacted files, but I've fixed all but 1 of them now,
which I'm leaving broken intentionally for further testing
if required.
If you don't
mind, could you have a look over the information below and
identify anything that looks like a problem, since obviously
we did have a bunch of GFID mismatched files, which based on
your email shouldn't happen..
I've included
everything I can think of, but if there is something else
you would like to see, please let me know.
# gluster
volume info gv0
Volume Name:
gv0
Type: Replicate
Volume ID:
0ec7c49d-811c-4d4d-a3a9-e4ea9e83000c
Status: Started
Snapshot Count:
0
Number of
Bricks: 1 x (2 + 1) = 3
Transport-type:
tcp
Bricks:
Brick1:
prod-gluster01.fqdn.com:/export/glus_brick0/brick
Brick2:
prod-gluster02.fqdn.com:/export/glus_brick0/brick
Brick3:
prod-gluster03.fqdn.com:/export/glus_brick0/brick (arbiter)
Options
Reconfigured:
cluster.favorite-child-policy:
none
nfs.disable: on
performance.readdir-ahead:
on
client.event-threads:
7
server.event-threads:
3
performance.cache-size:
256MB
cluster.favorite-child-policy
is set to none because I reverted the change to majority
when it didn't make any difference.
[root@prod-gluster01
glusterfs]# getfattr -d -m . -e hex
/export/glus_brick0/brick/home/user/.viminfo
getfattr:
Removing leading '/' from absolute path names
# file:
export/glus_brick0/brick/home/user/.viminfo
security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000
trusted.afr.dirty=0x000000000000000000000000
trusted.bit-rot.version=0x0200000000000000585756be00024333
trusted.gfid=0x1b86a5a76e884f40be583fa33aa9a576
[root@prod-gluster02
glusterfs]# getfattr -d -m . -e hex
/export/glus_brick0/brick/home/user/.viminfo
getfattr:
Removing leading '/' from absolute path names
# file:
export/glus_brick0/brick/home/user/.viminfo
security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000
trusted.afr.dirty=0x000000000000000000000000
trusted.afr.gv0-client-0=0x000000020000000100000000
trusted.bit-rot.version=0x020000000000000058593aac000661fa
trusted.gfid=0x4931b10977f34496a7cdf8f23809c372
[root@prod-gluster03
glusterfs]# getfattr -d -m . -e hex
/export/glus_brick0/brick/home/user/.viminfo
getfattr:
Removing leading '/' from absolute path names
# file:
export/glus_brick0/brick/home/user/.viminfo
security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000
trusted.afr.dirty=0x000000000000000000000000
trusted.afr.gv0-client-0=0x000000020000000100000000
trusted.bit-rot.version=0x020000000000000058585ed6000f2077
trusted.gfid=0x4931b10977f34496a7cdf8f23809c372
Just in case
it's useful, here is the getfattr for the parent directory:
[root@prod-gluster01
glusterfs]# getfattr -d -m . -e hex
/export/glus_brick0/brick/home/user
getfattr:
Removing leading '/' from absolute path names
# file:
export/glus_brick0/brick/home/user
security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000
trusted.afr.dirty=0x000000000000000000000000
trusted.afr.gv0-client-1=0x000000000000000000000000
trusted.afr.gv0-client-2=0x000000000000000000000000
trusted.gfid=0x0a49de7ee4f04aae9fc8a88378e0d193
trusted.glusterfs.dht=0x000000010000000000000000ffffffff
[root@prod-gluster02
glusterfs]# getfattr -d -m . -e hex
/export/glus_brick0/brick/home/user
getfattr:
Removing leading '/' from absolute path names
# file:
export/glus_brick0/brick/home/user
security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000
trusted.afr.dirty=0x000000000000000000000000
trusted.afr.gv0-client-0=0x000000000000000000000000
trusted.afr.gv0-client-2=0x000000000000000000000000
trusted.gfid=0x0a49de7ee4f04aae9fc8a88378e0d193
trusted.glusterfs.dht=0x000000010000000000000000ffffffff
[root@prod-gluster03
glusterfs]# getfattr -d -m . -e hex
/export/glus_brick0/brick/home/user
getfattr:
Removing leading '/' from absolute path names
# file:
export/glus_brick0/brick/home/user
security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000
trusted.afr.dirty=0x000000000000000000000000
trusted.afr.gv0-client-0=0x000000000000000000000000
trusted.afr.gv0-client-1=0x000000000000000000000000
trusted.afr.gv0-client-2=0x000000000000000000000000
trusted.gfid=0x0a49de7ee4f04aae9fc8a88378e0d193
trusted.glusterfs.dht=0x000000010000000000000000ffffffff
[root@prod-gluster01
bricks]# gluster volume heal gv0 info
Brick
prod-gluster01.fqdn.com:/export/glus_brick0/brick
Status:
Connected
Number of
entries: 0
Brick
prod-gluster02.fqdn.com:/export/glus_brick0/brick
<gfid:4931b109-77f3-4496-a7cd-f8f23809c372>
Status:
Connected
Number of
entries: 1
Brick
prod-gluster03.fqdn.com:/export/glus_brick0/brick
<gfid:4931b109-77f3-4496-a7cd-f8f23809c372>
Status:
Connected
Number of
entries: 1
[root@prod-gluster01
bricks]# gluster volume heal gv0 info split-brain
Brick
prod-gluster01.fqdn.com:/export/glus_brick0/brick
Status:
Connected
Number of
entries in split-brain: 0
Brick
prod-gluster02.fqdn.com:/export/glus_brick0/brick
Status:
Connected
Number of
entries in split-brain: 0
Brick
prod-gluster03.fqdn.com:/export/glus_brick0/brick
Status:
Connected
Number of
entries in split-brain: 0
Clients show
this in the gluster.log:
[2017-01-04
03:13:40.863695] W [MSGID: 108008]
[afr-self-heal-name.c:354:afr_selfheal_name_gfid_mismatch_check]
0-gv0-replicate-0: GFID mismatch for
<gfid:0a49de7e-e4f0-4aae-9fc8-a88378e0d193>/.viminfo
4931b109-77f3-4496-a7cd-f8f23809c372 on gv0-client-1 and
1b86a5a7-6e88-4f40-be58-3fa33aa9a576 on gv0-client-0
[2017-01-04
03:13:40.867853] W [fuse-bridge.c:471:fuse_entry_cbk]
0-glusterfs-fuse: 13067223: LOOKUP() /home/user/.viminfo
=> -1 (Input/output error)
There's no
mention of either of the GFID's for the .viminfo file in
/var/log/gluster/*.log or
/var/log/gluster/brick/export-glus_brick0-brick.log file.