The gfid mismatch here is between the shard and its "link-to" file, the creation of which happens at a layer below that of shard translator on the stack.
Adding DHT devs to take a look.
On Mon, Mar 26, 2018 at 1:09 AM, Ian Halliday <ihalliday@xxxxxxxxxx> wrote:
Hello all,We are having a rather interesting problem with one of our VM storage systems. The GlusterFS client is throwing errors relating to GFID mismatches. We traced this down to multiple shards being present on the gluster nodes, with different gfids.Hypervisor gluster mount log:[2018-03-25 18:54:19.261733] E [MSGID: 133010] [shard.c:1724:shard_common_lookup_shards_cbk] 0-ovirt-zone1-shard: Lookup on shard 7 failed. Base file gfid = 87137cac-49eb-492a-8f33- 8e33470d8cb7 [Stale file handle] The message "W [MSGID: 109009] [dht-common.c:2162:dht_lookup_linkfile_cbk] 0-ovirt-zone1-dht: /.shard/87137cac-49eb-492a- 8f33-8e33470d8cb7.7: gfid different on data file on ovirt-zone1-replicate-3, gfid local = 00000000-0000-0000-0000- 000000000000, gfid node = 57c6fcdf-52bb-4f7a-aea4- 02f0dc81ff56 " repeated 2 times between [2018-03-25 18:54:19.253748] and [2018-03-25 18:54:19.263576] [2018-03-25 18:54:19.264349] W [MSGID: 109009] [dht-common.c:1901:dht_lookup_everywhere_cbk] 0-ovirt-zone1-dht: /.shard/87137cac-49eb-492a- 8f33-8e33470d8cb7.7: gfid differs on subvolume ovirt-zone1-replicate-3, gfid local = fdf0813b-718a-4616-a51b- 6999ebba9ec3, gfid node = 57c6fcdf-52bb-4f7a-aea4- 02f0dc81ff56 On the storage nodes, we found this:[root@n1 gluster]# find -name 87137cac-49eb-492a-8f33-8e33470d8cb7.7 ./brick2/brick/.shard/87137cac-49eb-492a-8f33- 8e33470d8cb7.7 ./brick4/brick/.shard/87137cac-49eb-492a-8f33- 8e33470d8cb7.7 [root@n1 gluster]# ls -lh ./brick2/brick/.shard/87137cac-49eb-492a-8f33- 8e33470d8cb7.7 ---------T. 2 root root 0 Mar 25 13:55 ./brick2/brick/.shard/87137cac-49eb-492a-8f33- 8e33470d8cb7.7 [root@n1 gluster]# ls -lh ./brick4/brick/.shard/87137cac-49eb-492a-8f33- 8e33470d8cb7.7 -rw-rw----. 2 root root 3.8G Mar 25 13:55 ./brick4/brick/.shard/87137cac-49eb-492a-8f33- 8e33470d8cb7.7 [root@n1 gluster]# getfattr -d -m . -e hex ./brick2/brick/.shard/87137cac-49eb-492a-8f33- 8e33470d8cb7.7 # file: brick2/brick/.shard/87137cac-49eb-492a-8f33-8e33470d8cb7.7 security.selinux=0x73797374656d5f753a6f626a6563 745f723a756e6c6162656c65645f74 3a733000 trusted.gfid=0xfdf0813b718a4616a51b6999ebba 9ec3 trusted.glusterfs.dht.linkto=0x6f766972742d3335302d7a6f6e65 312d7265706c69636174652d3300 [root@n1 gluster]# getfattr -d -m . -e hex ./brick4/brick/.shard/87137cac-49eb-492a-8f33- 8e33470d8cb7.7 # file: brick4/brick/.shard/87137cac-49eb-492a-8f33-8e33470d8cb7.7 security.selinux=0x73797374656d5f753a6f626a6563 745f723a756e6c6162656c65645f74 3a733000 trusted.afr.dirty=0x000000000000000000000000 trusted.bit-rot.version=0x020000000000000059914190000c e672 trusted.gfid=0x57c6fcdf52bb4f7aaea402f0dc81 ff56 I'm wondering how they got created in the first place, and if anyone has any insight on how to fix it?Storage nodes:[root@n1 gluster]# gluster --versionglusterfs 4.0.0[root@n1 gluster]# gluster volume infoVolume Name: ovirt-350-zone1Type: Distributed-ReplicateVolume ID: 106738ed-9951-4270-822e-63c9bcd0a20e Status: StartedSnapshot Count: 0Number of Bricks: 7 x (2 + 1) = 21Transport-type: tcpBricks:Brick1: 10.0.6.100:/gluster/brick1/brick Brick2: 10.0.6.101:/gluster/brick1/brick Brick3: 10.0.6.102:/gluster/arbrick1/brick (arbiter) Brick4: 10.0.6.100:/gluster/brick2/brick Brick5: 10.0.6.101:/gluster/brick2/brick Brick6: 10.0.6.102:/gluster/arbrick2/brick (arbiter) Brick7: 10.0.6.100:/gluster/brick3/brick Brick8: 10.0.6.101:/gluster/brick3/brick Brick9: 10.0.6.102:/gluster/arbrick3/brick (arbiter) Brick10: 10.0.6.100:/gluster/brick4/brick Brick11: 10.0.6.101:/gluster/brick4/brick Brick12: 10.0.6.102:/gluster/arbrick4/brick (arbiter) Brick13: 10.0.6.100:/gluster/brick5/brick Brick14: 10.0.6.101:/gluster/brick5/brick Brick15: 10.0.6.102:/gluster/arbrick5/brick (arbiter) Brick16: 10.0.6.100:/gluster/brick6/brick Brick17: 10.0.6.101:/gluster/brick6/brick Brick18: 10.0.6.102:/gluster/arbrick6/brick (arbiter) Brick19: 10.0.6.100:/gluster/brick7/brick Brick20: 10.0.6.101:/gluster/brick7/brick Brick21: 10.0.6.102:/gluster/arbrick7/brick (arbiter) Options Reconfigured:cluster.min-free-disk: 50GBperformance.strict-write-ordering: off performance.strict-o-direct: offnfs.disable: offperformance.readdir-ahead: ontransport.address-family: inetperformance.cache-size: 1GBfeatures.shard: onfeatures.shard-block-size: 5GBserver.event-threads: 8server.outstanding-rpc-limit: 128storage.owner-uid: 36storage.owner-gid: 36performance.quick-read: offperformance.read-ahead: offperformance.io-cache: offperformance.stat-prefetch: oncluster.eager-lock: enablenetwork.remote-dio: enablecluster.quorum-type: autocluster.server-quorum-type: servercluster.data-self-heal-algorithm: full performance.flush-behind: offperformance.write-behind-window-size: 8MB client.event-threads: 8server.allow-insecure: onClient version:[root@kvm573 ~]# gluster --versionglusterfs 3.12.5Thanks!- Ian
_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://lists.gluster.org/mailman/listinfo/gluster-users
_______________________________________________ Gluster-users mailing list Gluster-users@xxxxxxxxxxx http://lists.gluster.org/mailman/listinfo/gluster-users