Re: [Stale file handle] in shard volume

Olaf Buitelaar <olaf.buitelaar@xxxxxxxxx> · Mon, 14 Jan 2019 09:11:19 +0100

Hi Krutika,

I think the main problem is that the shard files exists in 2 sub-volumes, 1 being valid and 1 being stale.
example;
sub-volume-1:
node-1: 
a38d64bc-a28b-4ee1-a0bb-f919e7a1022c.101487[stale]

node-2: 
a38d64bc-a28b-4ee1-a0bb-f919e7a1022c.101487[stale] 

node-3: 
a38d64bc-a28b-4ee1-a0bb-f919e7a1022c.101487[stale] 
sub-volume-2:

node-4: 
a38d64bc-a28b-4ee1-a0bb-f919e7a1022c.101487[good]

node-5: 
a38d64bc-a28b-4ee1-a0bb-f919e7a1022c.101487[good] 

node-6: 
a38d64bc-a28b-4ee1-a0bb-f919e7a1022c.101487[good]

More or less exactly as you described here; https://lists.gluster.org/pipermail/gluster-users/2018-March/033785.html
The VMS getting paused is i think a pure side-effect.

The issue seems to only surface on volumes with an arbiter brick and sharding enabled.
So i suspect something goes wrong or went wrong on the sharding translators layer.

I think the log you're interested in is this;
[2019-01-02 02:33:44.433169] I [MSGID: 113030] [posix.c:2171:posix_unlink] 0-ovirt-kube-posix: open-fd-key-status: 0 for /data/gfs/bricks/bricka/ovirt-kube/.shard/a38d64bc-a28b-4ee1-a0bb-f919e7a1022c.101487
[2019-01-02 02:33:44.433188] I [MSGID: 113031] [posix.c:2084:posix_skip_non_linkto_unlink] 0-posix: linkto_xattr status: 0 for /data/gfs/bricks/bricka/ovirt-kube/.shard/a38d64bc-a28b-4ee1-a0bb-f919e7a1022c.101487
[2019-01-02 02:33:44.475027] I [MSGID: 113030] [posix.c:2171:posix_unlink] 0-ovirt-kube-posix: open-fd-key-status: 0 for /data/gfs/bricks/bricka/ovirt-kube/.shard/a38d64bc-a28b-4ee1-a0bb-f919e7a1022c.101488
[2019-01-02 02:33:44.475059] I [MSGID: 113031] [posix.c:2084:posix_skip_non_linkto_unlink] 0-posix: linkto_xattr status: 0 for /data/gfs/bricks/bricka/ovirt-kube/.shard/a38d64bc-a28b-4ee1-a0bb-f919e7a1022c.101488
[2019-01-02 02:35:36.394536] I [MSGID: 115036] [server.c:535:server_rpc_notify] 0-ovirt-kube-server: disconnecting connection from lease-10.dc01.adsolutions-22506-2018/12/24-04:03:32:698336-ovirt-kube-client-2-0-0
[2019-01-02 02:35:36.394800] I [MSGID: 101055] [client_t.c:443:gf_client_unref] 0-ovirt-kube-server: Shutting down connection lease-10.dc01.adsolutions-22506-2018/12/24-04:03:32:698336-ovirt-kube-client-2-0-0
This is from the time the the aforementioned machine paused. I've attached also the other logs, unfortunate i cannot access the logs of 1 machine, but if you need those i can gather them later.
If you need more samples or info please let me know.

Thanks Olaf

Op ma 14 jan. 2019 om 08:16 schreef Krutika Dhananjay <kdhananj@xxxxxxxxxx>:
Hi,

So the main issue is that certain vms seem to be pausing? Did I understand that right?
Could you share the gluster-mount logs around the time the pause was seen? And the brick logs too please?

As for ESTALE errors, the real cause of pauses can be determined from errors/warnings logged by fuse. Mere occurrence of ESTALE errors against shard function in logs doesn't necessarily indicate that is the reason for the pause. Also, in this instance, the ESTALE errors it seems are propagated by the lower translators (DHT? protocol/client? Or even bricks?) and shard is merely logging the same.

-Krutika

On Sun, Jan 13, 2019 at 10:11 PM Olaf Buitelaar <olaf.buitelaar@xxxxxxxxx> wrote:
@Krutika if you need any further information, please let me know.

Thanks Olaf

Op vr 4 jan. 2019 om 07:51 schreef Nithya Balachandran <nbalacha@xxxxxxxxxx>:
Adding Krutika.

On Wed, 2 Jan 2019 at 20:56, Olaf Buitelaar <olaf.buitelaar@xxxxxxxxx> wrote:
Hi Nithya,

Thank you for your reply.

the VM's using the gluster volumes keeps on getting paused/stopped on errors like these;
[2019-01-02 02:33:44.469132] E [MSGID: 133010] [shard.c:1724:shard_common_lookup_shards_cbk] 0-ovirt-kube-shard: Lookup on shard 101487 failed. Base file gfid = a38d64bc-a28b-4ee1-a0bb-f919e7a1022c [Stale file handle]
[2019-01-02 02:33:44.563288] E [MSGID: 133010] [shard.c:1724:shard_common_lookup_shards_cbk] 0-ovirt-kube-shard: Lookup on shard 101488 failed. Base file gfid = a38d64bc-a28b-4ee1-a0bb-f919e7a1022c [Stale file handle]

Krutika, Can you take a look at this?

What i'm trying to find out, if i can purge all gluster volumes from all possible stale file handles (and hopefully find a method to prevent this in the future), so the VM's can start running stable again.
For this i need to know when the "shard_common_lookup_shards_cbk" function considers a file as stale.
The statement; "Stale file handle errors show up when a file with a specified gfid is not found." doesn't seem to cover it all, as i've shown in earlier mails the shard file and glusterfs/xx/xx/uuid file do both exist, and have the same inode.
If the criteria i'm using aren't correct, could you please tell me which criteria i should use to determine if a file is stale or not?
these criteria are just based observations i made, moving the stale files manually. After removing them i was able to start the VM again..until some time later it hangs on another stale shard file unfortunate.

Thanks Olaf

Op wo 2 jan. 2019 om 14:20 schreef Nithya Balachandran <nbalacha@xxxxxxxxxx>:

On Mon, 31 Dec 2018 at 01:27, Olaf Buitelaar <olaf.buitelaar@xxxxxxxxx> wrote:
Dear All,

till now a selected group of VM's still seem to produce new stale file's and getting paused due to this. 
I've not updated gluster recently, however i did change the op version from 
31200 to 31202 about a week before this issue arose.
Looking at the .shard directory, i've 100.000+ files sharing the same characteristics as a stale file. which are found till now, 
they all have the sticky bit set, e.g. file permissions; ---------T. are 0kb in size, and have the trusted.glusterfs.dht.linkto attribute.

These are internal files used by gluster and do not necessarily mean they are stale. They "point" to data files which may be on different bricks (same name, gfid etc but no linkto xattr and no ----T permissions).

These files range from long a go (beginning of the year) till now. Which makes me suspect this was laying dormant for some time now..and somehow recently surfaced.
Checking other sub-volumes they contain also 0kb files in the .shard directory, but don't have the sticky bit and the linkto attribute.

Does anybody else experience this issue? Could this be a bug or an environmental issue?
These are most likely valid files- please do not delete them without double-checking.

Stale file handle errors show up when a file with a specified gfid is not found. You will need to debug the files for which you see this error by checking the bricks to see if they actually exist.

Also i wonder if there is any tool or gluster command to clean all stale file handles? 
Otherwise i'm planning to make a simple bash script, which iterates over the .shard dir, checks each file for the above mentioned criteria, and (re)moves the file and the corresponding .glusterfs file.
If there are other criteria needed to identify a stale file handle, i would like to hear that.
If this is a viable and safe operation to do of course.

Thanks Olaf

Op do 20 dec. 2018 om 13:43 schreef Olaf Buitelaar <olaf.buitelaar@xxxxxxxxx>:
Dear All,

I figured it out, it appeared to be the exact same issue as described here;  https://lists.gluster.org/pipermail/gluster-users/2018-March/033785.html
Another subvolume also had the shard file, only were all 0 bytes and had the dht.linkto

for reference; 
[root@lease-04 ovirt-backbone-2]# getfattr -d -m . -e hex .shard/f28cabcb-d169-41fc-a633-9bef4c4a8e40.51500
# file: .shard/f28cabcb-d169-41fc-a633-9bef4c4a8e40.51500
security.selinux=0x73797374656d5f753a6f626a6563745f723a6574635f72756e74696d655f743a733000
trusted.gfid=0x298147e49f9748b2baf1c8fff897244d
trusted.gfid2path.b48064c78d7a85c9=0x62653331383633382d653861302d346336642d393737642d3761393337616138343830362f66323863616263622d643136392d343166632d613633332d3962656634633461386534302e3531353030
trusted.glusterfs.dht.linkto=0x6f766972742d6261636b626f6e652d322d7265706c69636174652d3100

[root@lease-04 ovirt-backbone-2]# getfattr -d -m . -e hex .glusterfs/29/81/298147e4-9f97-48b2-baf1-c8fff897244d
# file: .glusterfs/29/81/298147e4-9f97-48b2-baf1-c8fff897244d
security.selinux=0x73797374656d5f753a6f626a6563745f723a6574635f72756e74696d655f743a733000
trusted.gfid=0x298147e49f9748b2baf1c8fff897244d
trusted.gfid2path.b48064c78d7a85c9=0x62653331383633382d653861302d346336642d393737642d3761393337616138343830362f66323863616263622d643136392d343166632d613633332d3962656634633461386534302e3531353030
trusted.glusterfs.dht.linkto=0x6f766972742d6261636b626f6e652d322d7265706c69636174652d3100

[root@lease-04 ovirt-backbone-2]# stat .glusterfs/29/81/298147e4-9f97-48b2-baf1-c8fff897244d
  File: ‘.glusterfs/29/81/298147e4-9f97-48b2-baf1-c8fff897244d’
  Size: 0               Blocks: 0          IO Block: 4096   regular empty file
Device: fd01h/64769d    Inode: 1918631406  Links: 2
Access: (1000/---------T)  Uid: (    0/    root)   Gid: (    0/    root)
Context: system_u:object_r:etc_runtime_t:s0
Access: 2018-12-17 21:43:36.405735296 +0000
Modify: 2018-12-17 21:43:36.405735296 +0000
Change: 2018-12-17 21:43:36.405735296 +0000
 Birth: -

removing the shard file and glusterfs file from each node resolved the issue.

I also found this thread; https://lists.gluster.org/pipermail/gluster-users/2018-December/035460.html
Maybe he suffers from the same issue.

Best Olaf

Op wo 19 dec. 2018 om 21:56 schreef Olaf Buitelaar <olaf.buitelaar@xxxxxxxxx>:
Dear All,

It appears i've a stale file in one of the volumes, on 2 files. These files are qemu images (1 raw and 1 qcow2).
I'll just focus on 1 file since the situation on the other seems the same. 

The VM get's paused more or less directly after being booted with error;
[2018-12-18 14:05:05.275713] E [MSGID: 133010] [shard.c:1724:shard_common_lookup_shards_cbk] 0-ovirt-backbone-2-shard: Lookup on shard 51500 failed. Base file gfid = f28cabcb-d169-41fc-a633-9bef4c4a8e40 [Stale file handle]

investigating the shard;

#on the arbiter node:

[root@lease-05 ovirt-backbone-2]# getfattr -n glusterfs.gfid.string /mnt/ovirt-backbone-2/b1c2c949-aef4-4aec-999b-b179efeef732/images/f6ac9660-a84e-469e-a17c-c6dbc538af4b/d6b09501-5b79-4c92-bf10-2ed3bda1b425
getfattr: Removing leading '/' from absolute path names
# file: mnt/ovirt-backbone-2/b1c2c949-aef4-4aec-999b-b179efeef732/images/f6ac9660-a84e-469e-a17c-c6dbc538af4b/d6b09501-5b79-4c92-bf10-2ed3bda1b425
glusterfs.gfid.string="f28cabcb-d169-41fc-a633-9bef4c4a8e40"

[root@lease-05 ovirt-backbone-2]# getfattr -d -m . -e hex .shard/f28cabcb-d169-41fc-a633-9bef4c4a8e40.51500
# file: .shard/f28cabcb-d169-41fc-a633-9bef4c4a8e40.51500
security.selinux=0x73797374656d5f753a6f626a6563745f723a6574635f72756e74696d655f743a733000
trusted.afr.dirty=0x000000000000000000000000
trusted.gfid=0x1f86b4328ec6424699aa48cc6d7b5da0
trusted.gfid2path.b48064c78d7a85c9=0x62653331383633382d653861302d346336642d393737642d3761393337616138343830362f66323863616263622d643136392d343166632d613633332d3962656634633461386534302e3531353030

[root@lease-05 ovirt-backbone-2]# getfattr -d -m . -e hex .glusterfs/1f/86/1f86b432-8ec6-4246-99aa-48cc6d7b5da0
# file: .glusterfs/1f/86/1f86b432-8ec6-4246-99aa-48cc6d7b5da0
security.selinux=0x73797374656d5f753a6f626a6563745f723a6574635f72756e74696d655f743a733000
trusted.afr.dirty=0x000000000000000000000000
trusted.gfid=0x1f86b4328ec6424699aa48cc6d7b5da0
trusted.gfid2path.b48064c78d7a85c9=0x62653331383633382d653861302d346336642d393737642d3761393337616138343830362f66323863616263622d643136392d343166632d613633332d3962656634633461386534302e3531353030

[root@lease-05 ovirt-backbone-2]# stat .glusterfs/1f/86/1f86b432-8ec6-4246-99aa-48cc6d7b5da0
  File: ‘.glusterfs/1f/86/1f86b432-8ec6-4246-99aa-48cc6d7b5da0’
  Size: 0               Blocks: 0          IO Block: 4096   regular empty file
Device: fd01h/64769d    Inode: 537277306   Links: 2
Access: (0660/-rw-rw----)  Uid: (    0/    root)   Gid: (    0/    root)
Context: system_u:object_r:etc_runtime_t:s0
Access: 2018-12-17 21:43:36.361984810 +0000
Modify: 2018-12-17 21:43:36.361984810 +0000
Change: 2018-12-18 20:55:29.908647417 +0000
 Birth: -

[root@lease-05 ovirt-backbone-2]# find . -inum 537277306
./.glusterfs/1f/86/1f86b432-8ec6-4246-99aa-48cc6d7b5da0
./.shard/f28cabcb-d169-41fc-a633-9bef4c4a8e40.51500

#on the data nodes: 

[root@lease-08 ~]# getfattr -n glusterfs.gfid.string /mnt/ovirt-backbone-2/b1c2c949-aef4-4aec-999b-b179efeef732/images/f6ac9660-a84e-469e-a17c-c6dbc538af4b/d6b09501-5b79-4c92-bf10-2ed3bda1b425
getfattr: Removing leading '/' from absolute path names
# file: mnt/ovirt-backbone-2/b1c2c949-aef4-4aec-999b-b179efeef732/images/f6ac9660-a84e-469e-a17c-c6dbc538af4b/d6b09501-5b79-4c92-bf10-2ed3bda1b425
glusterfs.gfid.string="f28cabcb-d169-41fc-a633-9bef4c4a8e40"

[root@lease-08 ovirt-backbone-2]# getfattr -d -m . -e hex .shard/f28cabcb-d169-41fc-a633-9bef4c4a8e40.51500
# file: .shard/f28cabcb-d169-41fc-a633-9bef4c4a8e40.51500
security.selinux=0x73797374656d5f753a6f626a6563745f723a6574635f72756e74696d655f743a733000
trusted.afr.dirty=0x000000000000000000000000
trusted.gfid=0x1f86b4328ec6424699aa48cc6d7b5da0
trusted.gfid2path.b48064c78d7a85c9=0x62653331383633382d653861302d346336642d393737642d3761393337616138343830362f66323863616263622d643136392d343166632d613633332d3962656634633461386534302e3531353030

[root@lease-08 ovirt-backbone-2]# getfattr -d -m . -e hex .glusterfs/1f/86/1f86b432-8ec6-4246-99aa-48cc6d7b5da0
# file: .glusterfs/1f/86/1f86b432-8ec6-4246-99aa-48cc6d7b5da0
security.selinux=0x73797374656d5f753a6f626a6563745f723a6574635f72756e74696d655f743a733000
trusted.afr.dirty=0x000000000000000000000000
trusted.gfid=0x1f86b4328ec6424699aa48cc6d7b5da0
trusted.gfid2path.b48064c78d7a85c9=0x62653331383633382d653861302d346336642d393737642d3761393337616138343830362f66323863616263622d643136392d343166632d613633332d3962656634633461386534302e3531353030

[root@lease-08 ovirt-backbone-2]# stat .glusterfs/1f/86/1f86b432-8ec6-4246-99aa-48cc6d7b5da0
  File: ‘.glusterfs/1f/86/1f86b432-8ec6-4246-99aa-48cc6d7b5da0’
  Size: 2166784         Blocks: 4128       IO Block: 4096   regular file
Device: fd03h/64771d    Inode: 12893624759  Links: 3
Access: (0660/-rw-rw----)  Uid: (    0/    root)   Gid: (    0/    root)
Context: system_u:object_r:etc_runtime_t:s0
Access: 2018-12-18 18:52:38.070776585 +0000
Modify: 2018-12-17 21:43:36.388054443 +0000
Change: 2018-12-18 21:01:47.810506528 +0000
 Birth: -

[root@lease-08 ovirt-backbone-2]# find . -inum 12893624759
./.glusterfs/1f/86/1f86b432-8ec6-4246-99aa-48cc6d7b5da0
./.shard/f28cabcb-d169-41fc-a633-9bef4c4a8e40.51500

========================

[root@lease-11 ovirt-backbone-2]# getfattr -n glusterfs.gfid.string /mnt/ovirt-backbone-2/b1c2c949-aef4-4aec-999b-b179efeef732/images/f6ac9660-a84e-469e-a17c-c6dbc538af4b/d6b09501-5b79-4c92-bf10-2ed3bda1b425
getfattr: Removing leading '/' from absolute path names
# file: mnt/ovirt-backbone-2/b1c2c949-aef4-4aec-999b-b179efeef732/images/f6ac9660-a84e-469e-a17c-c6dbc538af4b/d6b09501-5b79-4c92-bf10-2ed3bda1b425
glusterfs.gfid.string="f28cabcb-d169-41fc-a633-9bef4c4a8e40"

[root@lease-11 ovirt-backbone-2]#  getfattr -d -m . -e hex .shard/f28cabcb-d169-41fc-a633-9bef4c4a8e40.51500
# file: .shard/f28cabcb-d169-41fc-a633-9bef4c4a8e40.51500
security.selinux=0x73797374656d5f753a6f626a6563745f723a6574635f72756e74696d655f743a733000
trusted.afr.dirty=0x000000000000000000000000
trusted.gfid=0x1f86b4328ec6424699aa48cc6d7b5da0
trusted.gfid2path.b48064c78d7a85c9=0x62653331383633382d653861302d346336642d393737642d3761393337616138343830362f66323863616263622d643136392d343166632d613633332d3962656634633461386534302e3531353030

[root@lease-11 ovirt-backbone-2]# getfattr -d -m . -e hex .glusterfs/1f/86/1f86b432-8ec6-4246-99aa-48cc6d7b5da0
# file: .glusterfs/1f/86/1f86b432-8ec6-4246-99aa-48cc6d7b5da0
security.selinux=0x73797374656d5f753a6f626a6563745f723a6574635f72756e74696d655f743a733000
trusted.afr.dirty=0x000000000000000000000000
trusted.gfid=0x1f86b4328ec6424699aa48cc6d7b5da0
trusted.gfid2path.b48064c78d7a85c9=0x62653331383633382d653861302d346336642d393737642d3761393337616138343830362f66323863616263622d643136392d343166632d613633332d3962656634633461386534302e3531353030

[root@lease-11 ovirt-backbone-2]# stat .glusterfs/1f/86/1f86b432-8ec6-4246-99aa-48cc6d7b5da0
  File: ‘.glusterfs/1f/86/1f86b432-8ec6-4246-99aa-48cc6d7b5da0’
  Size: 2166784         Blocks: 4128       IO Block: 4096   regular file
Device: fd03h/64771d    Inode: 12956094809  Links: 3
Access: (0660/-rw-rw----)  Uid: (    0/    root)   Gid: (    0/    root)
Context: system_u:object_r:etc_runtime_t:s0
Access: 2018-12-18 20:11:53.595208449 +0000
Modify: 2018-12-17 21:43:36.391580259 +0000
Change: 2018-12-18 19:19:25.888055392 +0000
 Birth: -

[root@lease-11 ovirt-backbone-2]# find . -inum 12956094809
./.glusterfs/1f/86/1f86b432-8ec6-4246-99aa-48cc6d7b5da0
./.shard/f28cabcb-d169-41fc-a633-9bef4c4a8e40.51500

================

I don't really see any inconsistencies, except the dates on the stat. However this is only after i tried moving the file out of the volumes to force a heal, which does happen on the data nodes, but not on the arbiter node. Before that they were also the same.
I've also compared the file 
./.shard/f28cabcb-d169-41fc-a633-9bef4c4a8e40.51500 on the 2 nodes and they are exactly the same.

Things i've further tried;
- gluster v heal ovirt-backbone-2 full => gluster v heal ovirt-backbone-2 info reports 0 entries on all nodes

- stop each glusterd and glusterfsd, pause around 40sec and start them again on each node, 1 at a time, waiting for the heal to recover before moving to the next node

- force a heal by stopping glusterd on a node and perform these steps;
mkdir /mnt/ovirt-backbone-2/trigger
rmdir /mnt/ovirt-backbone-2/trigger
setfattr -n trusted.non-existent-key -v abc /mnt/ovirt-backbone-2/
setfattr -x trusted.non-existent-key /mnt/ovirt-backbone-2/
start glusterd

- gluster volume rebalance ovirt-backbone-2 start => success

Whats further interesting is that according the mount log, the volume is in split-brain;
[2018-12-18 10:06:04.606870] E [MSGID: 108008] [afr-read-txn.c:90:afr_read_txn_refresh_done] 0-ovirt-backbone-2-replicate-2: Failing FSTAT on gfid 2a57d87d-fe49-4034-919b-fdb79531bf68: split-brain observed. [Input/output error]
[2018-12-18 10:06:04.606908] E [MSGID: 133014] [shard.c:1248:shard_common_stat_cbk] 0-ovirt-backbone-2-shard: stat failed: 2a57d87d-fe49-4034-919b-fdb79531bf68 [Input/output error]
[2018-12-18 10:06:04.606927] W [fuse-bridge.c:871:fuse_attr_cbk] 0-glusterfs-fuse: 428090: FSTAT() /b1c2c949-aef4-4aec-999b-b179efeef732/dom_md/ids => -1 (Input/output error)
[2018-12-18 10:06:05.107729] E [MSGID: 108008] [afr-read-txn.c:90:afr_read_txn_refresh_done] 0-ovirt-backbone-2-replicate-2: Failing FSTAT on gfid 2a57d87d-fe49-4034-919b-fdb79531bf68: split-brain observed. [Input/output error]
[2018-12-18 10:06:05.107770] E [MSGID: 133014] [shard.c:1248:shard_common_stat_cbk] 0-ovirt-backbone-2-shard: stat failed: 2a57d87d-fe49-4034-919b-fdb79531bf68 [Input/output error]
[2018-12-18 10:06:05.107791] W [fuse-bridge.c:871:fuse_attr_cbk] 0-glusterfs-fuse: 428091: FSTAT() /b1c2c949-aef4-4aec-999b-b179efeef732/dom_md/ids => -1 (Input/output error)
[2018-12-18 10:06:05.537244] I [MSGID: 108006] [afr-common.c:5494:afr_local_init] 0-ovirt-backbone-2-replicate-1: no subvolumes up
[2018-12-18 10:06:05.538523] E [MSGID: 108008] [afr-read-txn.c:90:afr_read_txn_refresh_done] 0-ovirt-backbone-2-replicate-2: Failing STAT on gfid 00000000-0000-0000-0000-000000000001: split-brain observed. [Input/output error]
[2018-12-18 10:06:05.538685] I [MSGID: 108006] [afr-common.c:5494:afr_local_init] 0-ovirt-backbone-2-replicate-1: no subvolumes up
[2018-12-18 10:06:05.538794] I [MSGID: 108006] [afr-common.c:5494:afr_local_init] 0-ovirt-backbone-2-replicate-1: no subvolumes up
[2018-12-18 10:06:05.539342] I [MSGID: 109063] [dht-layout.c:716:dht_layout_normalize] 0-ovirt-backbone-2-dht: Found anomalies in /b1c2c949-aef4-4aec-999b-b179efeef732 (gfid = 8c8598ce-1a52-418e-a7b4-435fee34bae8). Holes=2 overlaps=0
[2018-12-18 10:06:05.539372] W [MSGID: 109005] [dht-selfheal.c:2158:dht_selfheal_directory] 0-ovirt-backbone-2-dht: Directory selfheal failed: 2 subvolumes down.Not fixing. path = /b1c2c949-aef4-4aec-999b-b179efeef732, gfid = 8c8598ce-1a52-418e-a7b4-435fee34bae8
[2018-12-18 10:06:05.539694] I [MSGID: 108006] [afr-common.c:5494:afr_local_init] 0-ovirt-backbone-2-replicate-1: no subvolumes up
[2018-12-18 10:06:05.540652] I [MSGID: 108006] [afr-common.c:5494:afr_local_init] 0-ovirt-backbone-2-replicate-1: no subvolumes up
[2018-12-18 10:06:05.608612] E [MSGID: 108008] [afr-read-txn.c:90:afr_read_txn_refresh_done] 0-ovirt-backbone-2-replicate-2: Failing FSTAT on gfid 2a57d87d-fe49-4034-919b-fdb79531bf68: split-brain observed. [Input/output error]
[2018-12-18 10:06:05.608657] E [MSGID: 133014] [shard.c:1248:shard_common_stat_cbk] 0-ovirt-backbone-2-shard: stat failed: 2a57d87d-fe49-4034-919b-fdb79531bf68 [Input/output error]
[2018-12-18 10:06:05.608672] W [fuse-bridge.c:871:fuse_attr_cbk] 0-glusterfs-fuse: 428096: FSTAT() /b1c2c949-aef4-4aec-999b-b179efeef732/dom_md/ids => -1 (Input/output error)
[2018-12-18 10:06:06.109339] E [MSGID: 108008] [afr-read-txn.c:90:afr_read_txn_refresh_done] 0-ovirt-backbone-2-replicate-2: Failing FSTAT on gfid 2a57d87d-fe49-4034-919b-fdb79531bf68: split-brain observed. [Input/output error]
[2018-12-18 10:06:06.109378] E [MSGID: 133014] [shard.c:1248:shard_common_stat_cbk] 0-ovirt-backbone-2-shard: stat failed: 2a57d87d-fe49-4034-919b-fdb79531bf68 [Input/output error]
[2018-12-18 10:06:06.109399] W [fuse-bridge.c:871:fuse_attr_cbk] 0-glusterfs-fuse: 428097: FSTAT() /b1c2c949-aef4-4aec-999b-b179efeef732/dom_md/ids => -1 (Input/output error)

#note i'm able to see ; 
/b1c2c949-aef4-4aec-999b-b179efeef732/dom_md/ids 
[root@lease-11 ovirt-backbone-2]# stat /mnt/ovirt-backbone-2/b1c2c949-aef4-4aec-999b-b179efeef732/dom_md/ids
  File: ‘/mnt/ovirt-backbone-2/b1c2c949-aef4-4aec-999b-b179efeef732/dom_md/ids’
  Size: 1048576         Blocks: 2048       IO Block: 131072 regular file
Device: 41h/65d Inode: 10492258721813610344  Links: 1
Access: (0660/-rw-rw----)  Uid: (   36/    vdsm)   Gid: (   36/     kvm)
Context: system_u:object_r:fusefs_t:s0
Access: 2018-12-19 20:07:39.917573869 +0000
Modify: 2018-12-19 20:07:39.928573917 +0000
Change: 2018-12-19 20:07:39.929573921 +0000
 Birth: -

however checking: gluster v heal ovirt-backbone-2 info split-brain
reports no entries.

I've also tried mounting the qemu image, and this works fine, i'm able to see all contents;
 losetup /dev/loop0 /mnt/ovirt-backbone-2/b1c2c949-aef4-4aec-999b-b179efeef732/images/f6ac9660-a84e-469e-a17c-c6dbc538af4b/d6b09501-5b79-4c92-bf10-2ed3bda1b425
 kpartx -a /dev/loop0
 vgscan
 vgchange -ay slave-data
 mkdir /mnt/slv01

 mount /dev/mapper/slave--data-lvol0 /mnt/slv01/

Possible causes for this issue;
1. the machine "lease-11" suffered from a faulty RAM module (ECC), which halted the machine and causes an invalid state. (this machine also hosts other volumes, with similar configurations, which report no issue)
2. after the RAM module was replaced, the VM using the backing qemu image, was restored from a backup (the backup was file based within the VM on a different directory). This is because some files were corrupted. The backup/recovery obviously causes extra IO, possible introducing race conditions? The machine did run for about 12h without issues, and in total for about 36h.
3. since only the client (maybe only gfapi?) reports errors, something is broken there?

The volume info;
root@lease-06 ~# gluster v info ovirt-backbone-2

Volume Name: ovirt-backbone-2
Type: Distributed-Replicate
Volume ID: 85702d35-62c8-4c8c-930d-46f455a8af28
Status: Started
Snapshot Count: 0
Number of Bricks: 3 x (2 + 1) = 9
Transport-type: tcp
Bricks:
Brick1: 10.32.9.7:/data/gfs/bricks/brick1/ovirt-backbone-2
Brick2: 10.32.9.3:/data/gfs/bricks/brick1/ovirt-backbone-2
Brick3: 10.32.9.4:/data/gfs/bricks/bricka/ovirt-backbone-2 (arbiter)
Brick4: 10.32.9.8:/data0/gfs/bricks/brick1/ovirt-backbone-2
Brick5: 10.32.9.21:/data0/gfs/bricks/brick1/ovirt-backbone-2
Brick6: 10.32.9.5:/data/gfs/bricks/bricka/ovirt-backbone-2 (arbiter)
Brick7: 10.32.9.9:/data0/gfs/bricks/brick1/ovirt-backbone-2
Brick8: 10.32.9.20:/data0/gfs/bricks/brick1/ovirt-backbone-2
Brick9: 10.32.9.6:/data/gfs/bricks/bricka/ovirt-backbone-2 (arbiter)
Options Reconfigured:
nfs.disable: on
transport.address-family: inet
performance.quick-read: off
performance.read-ahead: off
performance.io-cache: off
performance.low-prio-threads: 32
network.remote-dio: enable
cluster.eager-lock: enable
cluster.quorum-type: auto
cluster.server-quorum-type: server
cluster.data-self-heal-algorithm: full
cluster.locking-scheme: granular
cluster.shd-max-threads: 8
cluster.shd-wait-qlength: 10000
features.shard: on
user.cifs: off
storage.owner-uid: 36
storage.owner-gid: 36
features.shard-block-size: 64MB
performance.write-behind-window-size: 512MB
performance.cache-size: 384MB
cluster.brick-multiplex: on

The volume status;
root@lease-06 ~# gluster v status ovirt-backbone-2
Status of volume: ovirt-backbone-2
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick 10.32.9.7:/data/gfs/bricks/brick1/ovi
rt-backbone-2                               49152     0          Y       7727
Brick 10.32.9.3:/data/gfs/bricks/brick1/ovi
rt-backbone-2                               49152     0          Y       12620
Brick 10.32.9.4:/data/gfs/bricks/bricka/ovi
rt-backbone-2                               49152     0          Y       8794
Brick 10.32.9.8:/data0/gfs/bricks/brick1/ov
irt-backbone-2                              49161     0          Y       22333
Brick 10.32.9.21:/data0/gfs/bricks/brick1/o
virt-backbone-2                             49152     0          Y       15030
Brick 10.32.9.5:/data/gfs/bricks/bricka/ovi
rt-backbone-2                               49166     0          Y       24592
Brick 10.32.9.9:/data0/gfs/bricks/brick1/ov
irt-backbone-2                              49153     0          Y       20148
Brick 10.32.9.20:/data0/gfs/bricks/brick1/o
virt-backbone-2                             49154     0          Y       15413
Brick 10.32.9.6:/data/gfs/bricks/bricka/ovi
rt-backbone-2                               49152     0          Y       43120
Self-heal Daemon on localhost               N/A       N/A        Y       44587
Self-heal Daemon on 10.201.0.2              N/A       N/A        Y       8401
Self-heal Daemon on 10.201.0.5              N/A       N/A        Y       11038
Self-heal Daemon on 10.201.0.8              N/A       N/A        Y       9513
Self-heal Daemon on 10.32.9.4               N/A       N/A        Y       23736
Self-heal Daemon on 10.32.9.20              N/A       N/A        Y       2738
Self-heal Daemon on 10.32.9.3               N/A       N/A        Y       25598
Self-heal Daemon on 10.32.9.5               N/A       N/A        Y       511
Self-heal Daemon on 10.32.9.9               N/A       N/A        Y       23357
Self-heal Daemon on 10.32.9.8               N/A       N/A        Y       15225
Self-heal Daemon on 10.32.9.7               N/A       N/A        Y       25781
Self-heal Daemon on 10.32.9.21              N/A       N/A        Y       5034

Task Status of Volume ovirt-backbone-2
------------------------------------------------------------------------------
Task                 : Rebalance
ID                   : 6dfbac43-0125-4568-9ac3-a2c453faaa3d
Status               : completed

gluster version is @3.12.15 and cluster.op-version=31202

========================

It would be nice to know if it's possible to mark the files as not stale or if i should investigate other things?
Or should we consider this volume lost?
Also checking the code at; https://github.com/gluster/glusterfs/blob/master/xlators/features/shard/src/shard.c it seems the functions shifted quite some (line 
1724 vs. 2243), so maybe it's fixed in a future version?
Any thoughts are welcome.

Thanks Olaf

_______________________________________________

Gluster-users mailing list

Gluster-users@xxxxxxxxxxx

https://lists.gluster.org/mailman/listinfo/gluster-users

Attachment:
data-gfs-bricks-bricka-ovirt-kube.log-20190106.gz

Description: GNU Zip compressed data
Attachment:
l7-data-gfs-bricks-brick1-ovirt-kube.log-20190106.gz

Description: GNU Zip compressed data
Attachment:
l10-data-gfs-bricks-brick1-ovirt-kube.log-20190106.gz

Description: GNU Zip compressed data
Attachment:
l5-data-gfs-bricks-bricka-ovirt-kube.log-20190106.gz

Description: GNU Zip compressed data
Attachment:
l8-data-gfs-bricks-brick1-ovirt-kube.log-20190106.gz

Description: GNU Zip compressed data
Attachment:
rhev-data-center-mnt-glusterSD-10.32.9.20_ovirt-kube.log-20190106.gz

Description: GNU Zip compressed data
Attachment:
l11-data-gfs-bricks-brick1-ovirt-kube.log-20190106.gz

Description: GNU Zip compressed data
Attachment:
l11-data-gfs-bricks-bricka-ovirt-kube.log-20190106.gz

Description: GNU Zip compressed data
_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
https://lists.gluster.org/mailman/listinfo/gluster-users