Re: GlusterFS 9.3 - Replicate Volume (2 Bricks / 1 Arbiter) - Self-healing does not always work

Ravishankar N <ravishankar.n@xxxxxxxxxxx> · Sun, 31 Oct 2021 12:05:23 +0530

On Sat, Oct 30, 2021 at 10:47 PM Strahil Nikolov <hunter86_bg@xxxxxxxxx> wrote:
Hi,
based on the output it seems that for some reason the file was deployed locally but not on the 2-nd brick and the arbiter , which for a 'replica 3 arbiter 1' (a.k.a replica 2 arbiter 1) is strange.

It seems that cluster.eager-lock is enabled as per the virt group: https://github.com/gluster/glusterfs/blob/devel/extras/group-virt.example

@Ravi,

do you think that it should not be enabled by default in the virt group ?

It should be enabled alright, but we have noticed some issues of stale locks (https://github.com/gluster/glusterfs/issues/ {2198, 2211, 2027}) which could prevent self-heal (or any other I/O that takes a blocking lock) from happening. But the problem here is different as you noticed. Thorsten needs to find the actual file (`find -samefile`) corresponding to this gfid and see what is the file size, hard-link count etc.) If it is a zero -byte file, then it should be safe to just delete the file and its hardlink from the brick.

Regards,
Ravi

Best Regards,
Strahil Nikolov

   On Sat, Oct 30, 2021 at 16:14, Thorsten Walk
<darkiop@xxxxxxxxx> wrote:

  Hi Ravi & Strahil, thanks a lot for your answer!
The file in the path .glusterfs/26/c5/.. only exists at node1 (=pve01). On node2 (pve02) and the arbiter (freya), the file does not exist:

┬[14:35:48] [ssh:root@pve01(192.168.1.50): ~ (700)]
╰─># getfattr -d -m. -e hex  /data/glusterfs/.glusterfs/26/c5/26c5396c-86ff-408d-9cda-106acd2b0768
getfattr: Removing leading '/' from absolute path names
# file: data/glusterfs/.glusterfs/26/c5/26c5396c-86ff-408d-9cda-106acd2b0768
trusted.afr.dirty=0x000000000000000000000000
trusted.afr.glusterfs-1-volume-client-1=0x000000010000000100000000
trusted.afr.glusterfs-1-volume-client-2=0x000000010000000100000000
trusted.gfid=0x26c5396c86ff408d9cda106acd2b0768
trusted.glusterfs.mdata=0x01000000000000000000000000617880a3000000003b2f011700000000617880a3000000003b2f011700000000617880a3000000003983a635

┬[14:36:49] [ssh:root@pve02(192.168.1.51): /data/glusterfs/.glusterfs/26/c5 (700)]
╰─># ll
drwx------ root root   6B 3 days ago   ./
drwx------ root root 8.0K 6 hours ago  ../

┬[14:36:58] [ssh:root@freya(192.168.1.40): /data/glusterfs/.glusterfs/26/c5 (700)]
╰─># ll
drwx------ root root   6B 3 days ago   ./
drwx------ root root 8.0K 3 hours ago  ../

After this, i have disabled the the option you mentioned:

gluster volume set glusterfs-1-volume cluster.eager-lock off

After that I started another healing process manually. Unfortunately without success.

@Strahil: For your idea with https://docs.gluster.org/en/latest/Troubleshooting/gfid-to-path/ i need more time, maybe i can try it tomorrow. I'll be in touch.

Thanks again and best regards,
Thorsten

________

Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://meet.google.com/cpu-eiue-hvk
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
https://lists.gluster.org/mailman/listinfo/gluster-users