Re: Rebuilding a failed cluster

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



You must have been running a really old version of glusterfs, 2 node systems haven't been supported for a few major releases now. if you want n-1 reliability you need at least a 4 node system.
On the bright side setup your new gluster system with approriate storage. Gluster doesn't do anything fancy with the data, it's all meta data magic, so trying to get a new modern glusterfs system to adopt your old bricks isn't worth the effort.
This is one of my bricks that I keep audio files on
z /gfss/brkaudio/
drwxr-xr-x. 8 root root 93 Dec 31 1969 /gfss/brkaudio/audio

# z /gfss/brkaudio/audio/
drwxr-xr-x. 5 root root 39 Dec 27 2019 /gfss/brkaudio/audio/music
drwxr-xr-x. 4 root root 28 Dec 27 2019 /gfss/brkaudio/audio/speech
drwxr-xr-x. 2 root root 6 Oct 15 2016 /gfss/brkaudio/audio/words
drwxrwxrwt. 2 root root 6 Jul 28 2020 /gfss/brkaudio/audio/work

This is where the meta data magic is, its all based on inode number
# ls -ald /gfss/brkaudio/audio/.*
drwxr-xr-x. 8 root root 93 Dec 31 1969 /gfss/brkaudio/audio/.
drwxr-xr-x. 3 root root 19 Dec 20 2019 /gfss/brkaudio/audio/..
drw-------. 263 root root 8192 Jul 18 2021 /gfss/brkaudio/audio/.glusterfs
drwxr-xr-x. 3 root root 25 Dec 27 2019 /gfss/brkaudio/audio/.trashcan

the way to recreate it is flip a coin pick your best bricks copy the data to the new gluster volumes, let it replicate. Then write a script with find to do compares with the second bricks data with the current new gluster data and figure out the problems.

Been there and done that.

On Sat, 2023-08-12 at 00:46 -0400, Richard Betel wrote:
I had a small cluster with a disperse 3 volume. 2 nodes had hardware failures and no longer boot, and I don't have replacement hardware for them (it's an old board called a PC-duino). However, I do have their intact root filesystems and the disks the bricks are on. 

So I need to rebuild the cluster on all new host hardware. does anyone have any suggestions on how to go about doing this? I've built 3 vms to be a new test cluster, but if I copy over a file from the 3 nodes and try to read it, I can't and get errors in /var/log/glusterfs/foo.log:
[2023-08-12 03:50:47.638134 +0000] W [MSGID: 114031] [client-rpc-fops_v2.c:2561:client4_0_lookup_cbk] 0-gv-client-0: remote operation failed. [{path=/helmetpart.scad}, {gfid=00000000-0000-0000-0000-000000000000}
, {errno=61}, {error=No data available}]
[2023-08-12 03:50:49.834859 +0000] E [MSGID: 122066] [ec-common.c:1301:ec_prepare_update_cbk] 0-gv-disperse-0: Unable to get config xattr. FOP : 'FXATTROP' failed on gfid 076a511d-3721-4231-ba3b-5c4cbdbd7f5d. Pa
rent FOP: READ [No data available]
[2023-08-12 03:50:49.834930 +0000] W [fuse-bridge.c:2994:fuse_readv_cbk] 0-glusterfs-fuse: 39: READ => -1 gfid=076a511d-3721-4231-ba3b-5c4cbdbd7f5d fd=0x7fbc9c001a98 (No data available)

so obviously, I need to copy over more stuff from the original cluster. If I force the 3 nodes and the volume to have the same uuids, will that be enough?
________



Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://meet.google.com/cpu-eiue-hvk
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
https://lists.gluster.org/mailman/listinfo/gluster-users

________



Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://meet.google.com/cpu-eiue-hvk
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
https://lists.gluster.org/mailman/listinfo/gluster-users

[Index of Archives]     [Gluster Development]     [Linux Filesytems Development]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux