Re: GlusterFS 9.3 - Replicate Volume (2 Bricks / 1 Arbiter) - Self-healing does not always work

Thorsten Walk <darkiop@xxxxxxxxx> · Tue, 30 Nov 2021 16:26:18 +0100

Hello all,
I have now rebuilt my cluster and am currently still in the process of putting it back into operation. Should the error occur again, I would contact you.

I would like to switch directly to GlusterFS 10. My two Intel NUCs are running Proxmox 7.1, so GlusterFS 10 is not an issue - there is a Debian repo for it.

My Arbiter (a Raspberry PI) is also running Debian Bullseye, but I couldn't find a repo for GlusterFS 10 @ arm. Can I run the Arbiter on v9 together with v10? Or is it better to stay on v9.

Thanks & Regards,
Thorsten

Am Fr., 5. Nov. 2021 um 20:46 Uhr schrieb Strahil Nikolov <hunter86_bg@xxxxxxxxx>:
You can mount the volume via # mount -t glusterfs -o aux-gfid-mount vm1:test /mnt/testvol
And then obtain the path:

getfattr -n trusted.glusterfs.pathinfo -e text /mnt/testvol/.gfid/<GFID>

Source: https://docs.gluster.org/en/latest/Troubleshooting/gfid-to-path/

Best Regards,
Strahil Nikolov

   On Fri, Nov 5, 2021 at 19:29, Thorsten Walk
<darkiop@xxxxxxxxx> wrote:

  Hi Guys,
I pushed some VMs to the GlusterFS storage this week and ran them there. For a maintenance task, I moved these VMs to Proxmox-Node-2 and took Node-1 offline for a short time.
After moving them back to Node-1 there were some file corpses left (see attachment). In the logs I can't find anything about the gfids :)

┬[15:36:51] [ssh:root@pve02(192.168.1.51): /home/darkiop (755)]
╰─># gvi

Cluster:
         Status: Healthy                 GlusterFS: 9.3
         Nodes: 3/3                      Volumes: 1/1

Volumes: 

glusterfs-1-volume
                Replicate          Started (UP) - 3/3 Bricks Up  - (Arbiter Volume)
                                   Capacity: (17.89% used) 83.00 GiB/466.00 GiB (used/total)
                                   Self-Heal:
                                      192.168.1.51:/data/glusterfs (4 File(s) to heal).
                                   Bricks:
                                      Distribute Group 1:
                                         192.168.1.50:/data/glusterfs   (Online)
                                         192.168.1.51:/data/glusterfs   (Online)
                                         192.168.1.40:/data/glusterfs   (Online)

Brick 192.168.1.50:/data/glusterfs
Status: Connected
Number of entries: 0

Brick 192.168.1.51:/data/glusterfs
<gfid:ade6f31c-b80b-457e-a054-6ca1548d9cd3> 
<gfid:39365c96-296b-4270-9cdb-1b751e40ad86> 
<gfid:54774d44-26a7-4954-a657-6e4fa79f2b97> 
<gfid:d5a8ae04-7301-4876-8d32-37fcd6093977> 
Status: Connected
Number of entries: 4

Brick 192.168.1.40:/data/glusterfs
Status: Connected
Number of entries: 0

┬[15:37:03] [ssh:root@pve02(192.168.1.51): /home/darkiop (755)]
╰─># cat /data/glusterfs/.glusterfs/ad/e6/ade6f31c-b80b-457e-a054-6ca1548d9cd3
22962

┬[15:37:13] [ssh:root@pve02(192.168.1.51): /home/darkiop (755)]
╰─># grep -ir 'ade6f31c-b80b-457e-a054-6ca1548d9cd3' /var/log/glusterfs/*.log

Am Mo., 1. Nov. 2021 um 07:51 Uhr schrieb Thorsten Walk <darkiop@xxxxxxxxx>:
After deleting the file, output of heal info is clear.

>Not sure why you ended up in this situation (maybe unlink partially failed on this brick?)
Neither did I, this was a completely fresh setup with 1-2 VMs and 1-2 Proxmox LXC templates. I let it run for a few days and at some point it had the mentioned state. I continue to monitor and start with fill the bricks with data.
Thanks for your help!

Am Mo., 1. Nov. 2021 um 02:54 Uhr schrieb Ravishankar N <ravishankar.n@xxxxxxxxxxx>:

On Mon, Nov 1, 2021 at 12:02 AM Thorsten Walk <darkiop@xxxxxxxxx> wrote:
Hi Ravi, the file only exists at pve01 and since only once:
┬[19:22:10] [ssh:root@pve01(192.168.1.50): ~ (700)]
╰─># stat /data/glusterfs/.glusterfs/26/c5/26c5396c-86ff-408d-9cda-106acd2b0768
  File: /data/glusterfs/.glusterfs/26/c5/26c5396c-86ff-408d-9cda-106acd2b0768
  Size: 6               Blocks: 8          IO Block: 4096   regular file
Device: fd12h/64786d    Inode: 528         Links: 1
Access: (0644/-rw-r--r--)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2021-10-30 14:34:50.385893588 +0200
Modify: 2021-10-27 00:26:43.988756557 +0200
Change: 2021-10-27 00:26:43.988756557 +0200
 Birth: -

┬[19:24:41] [ssh:root@pve01(192.168.1.50): ~ (700)]
╰─># ls -l /data/glusterfs/.glusterfs/26/c5/26c5396c-86ff-408d-9cda-106acd2b0768
.rw-r--r-- root root 6B 4 days ago  /data/glusterfs/.glusterfs/26/c5/26c5396c-86ff-408d-9cda-106acd2b0768

┬[19:24:54] [ssh:root@pve01(192.168.1.50): ~ (700)]
╰─># cat /data/glusterfs/.glusterfs/26/c5/26c5396c-86ff-408d-9cda-106acd2b0768
28084

Hi Thorsten, you can delete the file. From the file size and contents, it looks like it belongs to ovirt sanlock. Not sure why you ended up in this situation (maybe unlink partially failed on this brick?). You can check the mount, brick and self-heal daemon logs for this gfid to  see if you find related error/warning messages.

-Ravi

________

Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://meet.google.com/cpu-eiue-hvk
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
https://lists.gluster.org/mailman/listinfo/gluster-users