Recovering files "lost" during a rebalance on a Dispersed 3+1

Jeremy Davis-Turak <jeremy@xxxxxxxxxxxx> · Tue, 12 Sep 2023 09:53:25 -0600

Hello,
We are running glusterfs 6.6 on Ubuntu.

We have a Gluster storage system that is a few years old. There are 4 VMs running a Dispersed (NOT replicated) system - a 3 + 1 configuration. 

Generally performance is well tuned for our needs, but the problem arose last time we added bricks: we attempted a rebalance which is reported as failed.  From the mounted POSIX view of the file system, we see many files that report to be of size 0 bytes, which they shouldn’t be. 

We’ve attempted all kinds of heal and other operations to no avail. I finally figured out how to find the gfid of the files , and I found where it thought the shards were located. They were indeed 0 bytes … however, I was able to find shards with the same gfid located on other bricks. 

So, I think that when the rebalance failed, somehow the system kept thinking that the files should exist in the NEW brick location instead of the one that actually has content. For one file I did try to delete the shards of size 0, but the system still thinks that the file is of size 0, which means it didn’t point to the other shards with the same gfid. Is it possible to manually move shards from brick to another? I'm clearly tinkering with things that aren't meant to be tinkered with ... but I don't fully understand how GlusterFS functions under the hood.  

We’re at a loss as to how to fix this, and I haven’t had luck finding anyone who can help. We have quite a few files that we would like to recover, so it’s important that we figure out how to. 

Thanks,

Jeremy
________

Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://meet.google.com/cpu-eiue-hvk
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
https://lists.gluster.org/mailman/listinfo/gluster-users