On Mon, Mar 26, 2018 at 12:40 PM, Krutika Dhananjay <kdhananj@xxxxxxxxxx> wrote:
The gfid mismatch here is between the shard and its "link-to" file, the creation of which happens at a layer below that of shard translator on the stack.Adding DHT devs to take a look.
Thanks Krutika. I assume shard doesn't do any dentry operations like rename, link, unlink on the path of file (not the gfid handle based path) internally while managing shards. Can you confirm? If it does these operations, what fops does it do?
@Ian,
I can suggest following way to fix the problem:
* Since one of files listed is a DHT linkto file, I am assuming there is only one shard of the file. If not, please list out gfids of other shards and don't proceed with healing procedure.
* If gfids of all shards happen to be same and only linkto has a different gfid, please proceed to step 3. Otherwise abort the healing procedure.
* If cluster.lookup-optimize is set to true abort the healing procedure
* Delete the linkto file - the file with permissions -------T and xattr trusted.dht.linkto and do a lookup on the file from mount point after turning off readdriplus [1].
As to reasons on how we ended up in this situation, Can you explain me what is the I/O pattern on this file - like are there lots of entry operations like rename, link, unlink etc on the file? There have been known races in rename/lookup-heal-creating-linkto where linkto and data file have different gfids. [2] fixes some of these cases
regards,
Raghavendra
-KrutikaOn Mon, Mar 26, 2018 at 1:09 AM, Ian Halliday <ihalliday@xxxxxxxxxx> wrote:______________________________Hello all,We are having a rather interesting problem with one of our VM storage systems. The GlusterFS client is throwing errors relating to GFID mismatches. We traced this down to multiple shards being present on the gluster nodes, with different gfids.Hypervisor gluster mount log:[2018-03-25 18:54:19.261733] E [MSGID: 133010] [shard.c:1724:shard_common_lookup_shards_cbk] 0-ovirt-zone1-shard: Lookup on shard 7 failed. Base file gfid = 87137cac-49eb-492a-8f33-8e3347 0d8cb7 [Stale file handle] The message "W [MSGID: 109009] [dht-common.c:2162:dht_lookup_linkfile_cbk] 0-ovirt-zone1-dht: /.shard/87137cac-49eb-492a-8f3 3-8e33470d8cb7.7: gfid different on data file on ovirt-zone1-replicate-3, gfid local = 00000000-0000-0000-0000-000000 000000, gfid node = 57c6fcdf-52bb-4f7a-aea4-02f0dc 81ff56 " repeated 2 times between [2018-03-25 18:54:19.253748] and [2018-03-25 18:54:19.263576] [2018-03-25 18:54:19.264349] W [MSGID: 109009] [dht-common.c:1901:dht_lookup_everywhere_cbk] 0-ovirt-zone1-dht: /.shard/87137cac-49eb-492a-8f3 3-8e33470d8cb7.7: gfid differs on subvolume ovirt-zone1-replicate-3, gfid local = fdf0813b-718a-4616-a51b-6999eb ba9ec3, gfid node = 57c6fcdf-52bb-4f7a-aea4-02f0dc 81ff56 On the storage nodes, we found this:[root@n1 gluster]# find -name 87137cac-49eb-492a-8f33-8e33470d8cb7.7 ./brick2/brick/.shard/87137cac-49eb-492a-8f33-8e33470d8cb7.7 ./brick4/brick/.shard/87137cac-49eb-492a-8f33-8e33470d8cb7.7 [root@n1 gluster]# ls -lh ./brick2/brick/.shard/87137cac-49eb-492a-8f33-8e33470d8cb7.7 ---------T. 2 root root 0 Mar 25 13:55 ./brick2/brick/.shard/87137cac-49eb-492a-8f33-8e33470d8cb7.7 [root@n1 gluster]# ls -lh ./brick4/brick/.shard/87137cac-49eb-492a-8f33-8e33470d8cb7.7 -rw-rw----. 2 root root 3.8G Mar 25 13:55 ./brick4/brick/.shard/87137cac-49eb-492a-8f33-8e33470d8cb7.7 [root@n1 gluster]# getfattr -d -m . -e hex ./brick2/brick/.shard/87137cac-49eb-492a-8f33-8e33470d8cb7.7 # file: brick2/brick/.shard/87137cac-49eb-492a-8f33-8e33470d8cb7.7 security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6 c6162656c65645f743a733000 trusted.gfid=0xfdf0813b718a4616a51b6999ebba9ec3 trusted.glusterfs.dht.linkto=0x6f766972742d3335302d7a6f6e653 12d7265706c69636174652d3300 [root@n1 gluster]# getfattr -d -m . -e hex ./brick4/brick/.shard/87137cac-49eb-492a-8f33-8e33470d8cb7.7 # file: brick4/brick/.shard/87137cac-49eb-492a-8f33-8e33470d8cb7.7 security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6 c6162656c65645f743a733000 trusted.afr.dirty=0x000000000000000000000000 trusted.bit-rot.version=0x020000000000000059914190000ce672 trusted.gfid=0x57c6fcdf52bb4f7aaea402f0dc81ff56 I'm wondering how they got created in the first place, and if anyone has any insight on how to fix it?Storage nodes:[root@n1 gluster]# gluster --versionglusterfs 4.0.0[root@n1 gluster]# gluster volume infoVolume Name: ovirt-350-zone1Type: Distributed-ReplicateVolume ID: 106738ed-9951-4270-822e-63c9bcd0a20e Status: StartedSnapshot Count: 0Number of Bricks: 7 x (2 + 1) = 21Transport-type: tcpBricks:Brick1: 10.0.6.100:/gluster/brick1/brick Brick2: 10.0.6.101:/gluster/brick1/brick Brick3: 10.0.6.102:/gluster/arbrick1/brick (arbiter) Brick4: 10.0.6.100:/gluster/brick2/brick Brick5: 10.0.6.101:/gluster/brick2/brick Brick6: 10.0.6.102:/gluster/arbrick2/brick (arbiter) Brick7: 10.0.6.100:/gluster/brick3/brick Brick8: 10.0.6.101:/gluster/brick3/brick Brick9: 10.0.6.102:/gluster/arbrick3/brick (arbiter) Brick10: 10.0.6.100:/gluster/brick4/brick Brick11: 10.0.6.101:/gluster/brick4/brick Brick12: 10.0.6.102:/gluster/arbrick4/brick (arbiter) Brick13: 10.0.6.100:/gluster/brick5/brick Brick14: 10.0.6.101:/gluster/brick5/brick Brick15: 10.0.6.102:/gluster/arbrick5/brick (arbiter) Brick16: 10.0.6.100:/gluster/brick6/brick Brick17: 10.0.6.101:/gluster/brick6/brick Brick18: 10.0.6.102:/gluster/arbrick6/brick (arbiter) Brick19: 10.0.6.100:/gluster/brick7/brick Brick20: 10.0.6.101:/gluster/brick7/brick Brick21: 10.0.6.102:/gluster/arbrick7/brick (arbiter) Options Reconfigured:cluster.min-free-disk: 50GBperformance.strict-write-ordering: off performance.strict-o-direct: offnfs.disable: offperformance.readdir-ahead: ontransport.address-family: inetperformance.cache-size: 1GBfeatures.shard: onfeatures.shard-block-size: 5GBserver.event-threads: 8server.outstanding-rpc-limit: 128storage.owner-uid: 36storage.owner-gid: 36performance.quick-read: offperformance.read-ahead: offperformance.io-cache: offperformance.stat-prefetch: oncluster.eager-lock: enablenetwork.remote-dio: enablecluster.quorum-type: autocluster.server-quorum-type: servercluster.data-self-heal-algorithm: full performance.flush-behind: offperformance.write-behind-window-size: 8MB client.event-threads: 8server.allow-insecure: onClient version:[root@kvm573 ~]# gluster --versionglusterfs 3.12.5Thanks!- Ian_________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://lists.gluster.org/mailman/listinfo/gluster-users
_______________________________________________ Gluster-users mailing list Gluster-users@xxxxxxxxxxx http://lists.gluster.org/mailman/listinfo/gluster-users