Sharding problem - multiple shard copies with mismatching gfids

"Ian Halliday" <ihalliday@xxxxxxxxxx> · Sun, 25 Mar 2018 19:39:28 +0000

Hello all,

We are having a rather interesting problem with one of our VM storage systems. The GlusterFS client is throwing errors relating to GFID mismatches. We traced this down to multiple shards being present on the gluster nodes, with different gfids.

Hypervisor gluster mount log:

[2018-03-25 18:54:19.261733] E [MSGID: 133010] [shard.c:1724:shard_common_lookup_shards_cbk] 0-ovirt-zone1-shard: Lookup on shard 7 failed. Base file gfid = 87137cac-49eb-492a-8f33-8e33470d8cb7 [Stale file handle]
The message "W [MSGID: 109009] [dht-common.c:2162:dht_lookup_linkfile_cbk] 0-ovirt-zone1-dht: /.shard/87137cac-49eb-492a-8f33-8e33470d8cb7.7: gfid different on data file on ovirt-zone1-replicate-3, gfid local = 00000000-0000-0000-0000-000000000000, gfid node = 57c6fcdf-52bb-4f7a-aea4-02f0dc81ff56 " repeated 2 times between [2018-03-25 18:54:19.253748] and [2018-03-25 18:54:19.263576]
[2018-03-25 18:54:19.264349] W [MSGID: 109009] [dht-common.c:1901:dht_lookup_everywhere_cbk] 0-ovirt-zone1-dht: /.shard/87137cac-49eb-492a-8f33-8e33470d8cb7.7: gfid differs on subvolume ovirt-zone1-replicate-3, gfid local = fdf0813b-718a-4616-a51b-6999ebba9ec3, gfid node = 57c6fcdf-52bb-4f7a-aea4-02f0dc81ff56

On the storage nodes, we found this:

[root@n1 gluster]# find -name 87137cac-49eb-492a-8f33-8e33470d8cb7.7
./brick2/brick/.shard/87137cac-49eb-492a-8f33-8e33470d8cb7.7
./brick4/brick/.shard/87137cac-49eb-492a-8f33-8e33470d8cb7.7

[root@n1 gluster]# ls -lh ./brick2/brick/.shard/87137cac-49eb-492a-8f33-8e33470d8cb7.7
---------T. 2 root root 0 Mar 25 13:55 ./brick2/brick/.shard/87137cac-49eb-492a-8f33-8e33470d8cb7.7
[root@n1 gluster]# ls -lh ./brick4/brick/.shard/87137cac-49eb-492a-8f33-8e33470d8cb7.7
-rw-rw----. 2 root root 3.8G Mar 25 13:55 ./brick4/brick/.shard/87137cac-49eb-492a-8f33-8e33470d8cb7.7

[root@n1 gluster]# getfattr -d -m . -e hex ./brick2/brick/.shard/87137cac-49eb-492a-8f33-8e33470d8cb7.7
# file: brick2/brick/.shard/87137cac-49eb-492a-8f33-8e33470d8cb7.7
security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000
trusted.gfid=0xfdf0813b718a4616a51b6999ebba9ec3
trusted.glusterfs.dht.linkto=0x6f766972742d3335302d7a6f6e65312d7265706c69636174652d3300

[root@n1 gluster]# getfattr -d -m . -e hex ./brick4/brick/.shard/87137cac-49eb-492a-8f33-8e33470d8cb7.7
# file: brick4/brick/.shard/87137cac-49eb-492a-8f33-8e33470d8cb7.7
security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000
trusted.afr.dirty=0x000000000000000000000000
trusted.bit-rot.version=0x020000000000000059914190000ce672
trusted.gfid=0x57c6fcdf52bb4f7aaea402f0dc81ff56

I'm wondering how they got created in the first place, and if anyone has any insight on how to fix it?

Storage nodes:
[root@n1 gluster]# gluster --version
glusterfs 4.0.0

[root@n1 gluster]# gluster volume info

Volume Name: ovirt-350-zone1
Type: Distributed-Replicate
Volume ID: 106738ed-9951-4270-822e-63c9bcd0a20e
Status: Started
Snapshot Count: 0
Number of Bricks: 7 x (2 + 1) = 21
Transport-type: tcp
Bricks:
Brick1: 10.0.6.100:/gluster/brick1/brick
Brick2: 10.0.6.101:/gluster/brick1/brick
Brick3: 10.0.6.102:/gluster/arbrick1/brick (arbiter)
Brick4: 10.0.6.100:/gluster/brick2/brick
Brick5: 10.0.6.101:/gluster/brick2/brick
Brick6: 10.0.6.102:/gluster/arbrick2/brick (arbiter)
Brick7: 10.0.6.100:/gluster/brick3/brick
Brick8: 10.0.6.101:/gluster/brick3/brick
Brick9: 10.0.6.102:/gluster/arbrick3/brick (arbiter)
Brick10: 10.0.6.100:/gluster/brick4/brick
Brick11: 10.0.6.101:/gluster/brick4/brick
Brick12: 10.0.6.102:/gluster/arbrick4/brick (arbiter)
Brick13: 10.0.6.100:/gluster/brick5/brick
Brick14: 10.0.6.101:/gluster/brick5/brick
Brick15: 10.0.6.102:/gluster/arbrick5/brick (arbiter)
Brick16: 10.0.6.100:/gluster/brick6/brick
Brick17: 10.0.6.101:/gluster/brick6/brick
Brick18: 10.0.6.102:/gluster/arbrick6/brick (arbiter)
Brick19: 10.0.6.100:/gluster/brick7/brick
Brick20: 10.0.6.101:/gluster/brick7/brick
Brick21: 10.0.6.102:/gluster/arbrick7/brick (arbiter)
Options Reconfigured:
cluster.min-free-disk: 50GB
performance.strict-write-ordering: off
performance.strict-o-direct: off
nfs.disable: off
performance.readdir-ahead: on
transport.address-family: inet
performance.cache-size: 1GB
features.shard: on
features.shard-block-size: 5GB
server.event-threads: 8
server.outstanding-rpc-limit: 128
storage.owner-uid: 36
storage.owner-gid: 36
performance.quick-read: off
performance.read-ahead: off
performance.io-cache: off
performance.stat-prefetch: on
cluster.eager-lock: enable
network.remote-dio: enable
cluster.quorum-type: auto
cluster.server-quorum-type: server
cluster.data-self-heal-algorithm: full
performance.flush-behind: off
performance.write-behind-window-size: 8MB
client.event-threads: 8
server.allow-insecure: on

Client version: 
[root@kvm573 ~]# gluster --version
glusterfs 3.12.5

Thanks!

- Ian

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://lists.gluster.org/mailman/listinfo/gluster-users