Re: Rebuilding a failed cluster

Arno Karner <arnokarner@xxxxxxxxx> · Sat, 12 Aug 2023 21:25:31 -0500

You must have been running a really old version of glusterfs, 2 node systems haven't been supported for a few major releases now. if you want n-1 reliability you need at least a 4 node system.
On the bright side setup your new gluster system with approriate storage. Gluster doesn't do anything fancy with the data, it's all meta data magic, so trying to get a new modern glusterfs system to adopt your old bricks isn't worth the effort.
This is one of my bricks that I keep audio files on
 z /gfss/brkaudio/
drwxr-xr-x. 8 root root 93 Dec 31  1969 /gfss/brkaudio/audio

# z /gfss/brkaudio/audio/
drwxr-xr-x. 5 root root 39 Dec 27  2019 /gfss/brkaudio/audio/music
drwxr-xr-x. 4 root root 28 Dec 27  2019 /gfss/brkaudio/audio/speech
drwxr-xr-x. 2 root root  6 Oct 15  2016 /gfss/brkaudio/audio/words
drwxrwxrwt. 2 root root  6 Jul 28  2020 /gfss/brkaudio/audio/work

This is where the meta data magic is, its all based on inode number
# ls -ald /gfss/brkaudio/audio/.*
drwxr-xr-x.   8 root root   93 Dec 31  1969 /gfss/brkaudio/audio/.
drwxr-xr-x.   3 root root   19 Dec 20  2019 /gfss/brkaudio/audio/..
drw-------. 263 root root 8192 Jul 18  2021 /gfss/brkaudio/audio/.glusterfs
drwxr-xr-x.   3 root root   25 Dec 27  2019 /gfss/brkaudio/audio/.trashcan

the way to recreate it is flip a coin pick your best bricks copy the data to the new gluster volumes, let it replicate. Then write a script with find to do compares with the second bricks data with the current new gluster data and figure out the problems.

Been there and done that.

On Sat, 2023-08-12 at 00:46 -0400, Richard Betel wrote:
I had a small cluster with a disperse 3 volume. 2 nodes had hardware failures and no longer boot, and I don't have replacement hardware for them (it's an old board called a PC-duino). However, I do have their intact root filesystems and the disks the bricks are on. 

So I need to rebuild the cluster on all new host hardware. does anyone have any suggestions on how to go about doing this? I've built 3 vms to be a new test cluster, but if I copy over a file from the 3 nodes and try to read it, I can't and get errors in /var/log/glusterfs/foo.log:
[2023-08-12 03:50:47.638134 +0000] W [MSGID: 114031] [client-rpc-fops_v2.c:2561:client4_0_lookup_cbk] 0-gv-client-0: remote operation failed. [{path=/helmetpart.scad}, {gfid=00000000-0000-0000-0000-000000000000}
, {errno=61}, {error=No data available}]
[2023-08-12 03:50:49.834859 +0000] E [MSGID: 122066] [ec-common.c:1301:ec_prepare_update_cbk] 0-gv-disperse-0: Unable to get config xattr. FOP : 'FXATTROP' failed on gfid 076a511d-3721-4231-ba3b-5c4cbdbd7f5d. Pa
rent FOP: READ [No data available]
[2023-08-12 03:50:49.834930 +0000] W [fuse-bridge.c:2994:fuse_readv_cbk] 0-glusterfs-fuse: 39: READ => -1 gfid=076a511d-3721-4231-ba3b-5c4cbdbd7f5d fd=0x7fbc9c001a98 (No data available)

so obviously, I need to copy over more stuff from the original cluster. If I force the 3 nodes and the volume to have the same uuids, will that be enough?
________

Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://meet.google.com/cpu-eiue-hvk
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
https://lists.gluster.org/mailman/listinfo/gluster-users

________

Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://meet.google.com/cpu-eiue-hvk
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
https://lists.gluster.org/mailman/listinfo/gluster-users