Re: Rebuilding a failed cluster

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Much depends on the original volume layout. For replica volumes you'll find multiple copies of the same file on different bricks. And sometimes 0-byte files that are placeholders of renamed files: do not overwrite a good file with its empty version! If the old volume is still online, it's better if you copy from its FUSE mount point to the new one. But since it's a temporary "backup", there's no need to use another Gluster volume as the destination: just use a USB drive directly connected to the old nodes (one at a time) or to a machine that can still FUSE mount the old volume. Once you have a backup, write-protect it and experiment freely :)

Diego

Il 29/11/2023 19:17, Richard Betel ha scritto:
Ok, it's been a while, but I'm getting back to this "project".
I was unable to get gluster for the platform: the machines are ARM-based, and there are no ARM binaries on the gluster package repo. I tried building it instead, but the version of gluster I was running was quite old, and I couldn't get all the right package versions to do a successful build. As a result, it sounds like my best option is to follow your alternate suggestion: "The other option is to setup a new cluster and volume and then mount the volume via FUSE and copy the data from one of the bricks."

I want to be sure I understand what you're saying, though. Here's my plan:
create 3 VMs on amd64 processors(*)
Give each a 100G brick
set up the 3 bricks as disperse
mount the new gluster volume on my workstation
copy directories from one of the old bricks to the mounted new GFS volume
Copy fully restored data from new GFS volume to workstation or whatever permanent setup I go with.

Is that right? Or do I want the GFS system to be offline while I copy the contents of the old brick to the new brick?

(*) I'm not planning to keep my GFS on VMs on cloud, I just want something temporary to work with so I don't blow up anything else.




On Sat, 12 Aug 2023 at 09:20, Strahil Nikolov <hunter86_bg@xxxxxxxxx <mailto:hunter86_bg@xxxxxxxxx>> wrote:

    If you preserved the gluster structure in /etc/ and /var/lib, you
    should be able to run the cluster again.
    First install the same gluster version all nodes and then overwrite
    the structure in /etc and in /var/lib.
    Once you mount the bricks , start glusterd and check the situation.

    The other option is to setup a new cluster and volume and then mount
    the volume via FUSE and copy the data from one of the bricks.

    Best Regards,
    Strahil Nikolov

    On Saturday, August 12, 2023, 7:46 AM, Richard Betel
    <emteeoh@xxxxxxxxx <mailto:emteeoh@xxxxxxxxx>> wrote:

        I had a small cluster with a disperse 3 volume. 2 nodes had
        hardware failures and no longer boot, and I don't have
        replacement hardware for them (it's an old board called a
        PC-duino). However, I do have their intact root filesystems and
        the disks the bricks are on.

        So I need to rebuild the cluster on all new host hardware. does
        anyone have any suggestions on how to go about doing this? I've
        built 3 vms to be a new test cluster, but if I copy over a file
        from the 3 nodes and try to read it, I can't and get errors in
        /var/log/glusterfs/foo.log:
        [2023-08-12 03:50:47.638134 +0000] W [MSGID: 114031]
        [client-rpc-fops_v2.c:2561:client4_0_lookup_cbk] 0-gv-client-0:
        remote operation failed. [{path=/helmetpart.scad},
        {gfid=00000000-0000-0000-0000-000000000000}
        , {errno=61}, {error=No data available}]
        [2023-08-12 03:50:49.834859 +0000] E [MSGID: 122066]
        [ec-common.c:1301:ec_prepare_update_cbk] 0-gv-disperse-0: Unable
        to get config xattr. FOP : 'FXATTROP' failed on gfid
        076a511d-3721-4231-ba3b-5c4cbdbd7f5d. Pa
        rent FOP: READ [No data available]
        [2023-08-12 03:50:49.834930 +0000] W
        [fuse-bridge.c:2994:fuse_readv_cbk] 0-glusterfs-fuse: 39: READ
        => -1 gfid=076a511d-3721-4231-ba3b-5c4cbdbd7f5d
        fd=0x7fbc9c001a98 (No data available)

        so obviously, I need to copy over more stuff from the original
        cluster. If I force the 3 nodes and the volume to have the same
        uuids, will that be enough?
        ________



        Community Meeting Calendar:

        Schedule -
        Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
        Bridge: https://meet.google.com/cpu-eiue-hvk
        <https://meet.google.com/cpu-eiue-hvk>
        Gluster-users mailing list
        Gluster-users@xxxxxxxxxxx <mailto:Gluster-users@xxxxxxxxxxx>
        https://lists.gluster.org/mailman/listinfo/gluster-users
        <https://lists.gluster.org/mailman/listinfo/gluster-users>


________



Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://meet.google.com/cpu-eiue-hvk
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
https://lists.gluster.org/mailman/listinfo/gluster-users

--
Diego Zuccato
DIFA - Dip. di Fisica e Astronomia
Servizi Informatici
Alma Mater Studiorum - Università di Bologna
V.le Berti-Pichat 6/2 - 40127 Bologna - Italy
tel.: +39 051 20 95786
________



Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://meet.google.com/cpu-eiue-hvk
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
https://lists.gluster.org/mailman/listinfo/gluster-users




[Index of Archives]     [Gluster Development]     [Linux Filesytems Development]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux