Re: [EXT] Replace broken host, keeping the existing bricks

Stefan Solbrig <stefan.solbrig@xxxxx> · Tue, 11 Jun 2024 13:51:19 +0200

Hi,
The method depends a bit if you use a distributed-only system (like me) or a replicated setting.
I'm using a distributed-only setting (many bricks on different servers, but no replication).  All my servers boot via network, i.e., on a start, it's like a new host. 

To rescue the old bricks, just set up a new server this the same OS, the same IP and and the same hostname (!very important).  The simplest thing would be if you could retrieve the files in /var/lib/glusterd 
If you install a completely new server (but with the same IP and the same hostname), _then_ restore the files in /var/lib/glusterd,  you can just use it as before. It will be recognised as the previous peer, without any additional commands.

In fact, I think that /var/lib/glusterd/*...   should be identical on all servers, except
/var/lib/glusterd/glusterd.info
which holds the UUID of the server. However, you should be able to retrieve the UUID from the command:
gluster pool list
This is your scenario 2a)

Note that if it's __not__ a distributed-only system, other steps might be necessary.

Your 2b) scenario should also work, but slightly different. (Again, only distributed-only)
I use it occasionally for failover mode, but I haven't tested it extensively:

    gluster v reset-brick NameOfVolume FailedServer:/path/to/brick   start

    gluster v add-brick NameOfVolume NewServer:/path/to/brick  force

    # Order is important!
    # if brick is removed before other brick is added,
    # will lead to duplicate files.

    gluster v remove-brick NameOfVolume FailedServer:/path/to/brick force

    gluster v rebalance NameOfVolume fix-layout  start

If it's also replcated or striped or using sharding, then other steps might be necessary.

best wishes,
Stefan Solbrig

-- 
Dr. Stefan Solbrig
Universität Regensburg
Fakultät für Informatik und Data Science
93040 Regensburg

Am 09.06.2024 um 15:00 schrieb kup@xxxxxxxxx:

Hi all,

I know there are many tutorials on how to replace a gluster host that has become unusable. But they all seem to assume that the bricks of the respective host are gone, too.

My problem is different and (I hope) more easily solved: the disk with the host’s root file system died and cannot be recovered. However, all of its bricks are on separate disks and completely undamaged.

I‘m seeking your advice on what is best practice for replacing such a host.

My notion is that it should be possible to setup a new root system, configure it and have it use the existing bricks.

My questions are:

1) Is this a good idea at all or do I miss anything? Would it be better to format the existing bricks and start over with a completely clean new host, like most of the tutorials do?

2) If it is feasible to use the existing bricks, two scenarios come to my mind:

  a) Setup a new root file system for a gluster host and copy/change gluster configuration from one of the existing hosts. Adjust it so that the newly setup host actually thinks it is the old host (that died). I.e., copying over the gluster UID, Volume configurations, Hostnamen, IP, etc. (What else would it need?)
     The pool would then recognize the new host as identical to the old one that died and accept it just like the old host came online again. 

  b) Setup a new root file system for a gluster host and probe it into the trusted pool, with a new name and new gluster UID. Transfer bricks of the old host that died to the new one using „change-brick“. There would be no need for lengthy syncing as most of the data is existing and up-to-date on the new host (that has the bricks of the old host), only self-heal would take place.

Do these scenarios sound sane to you and which one would be best practice in this situation? This is a production system, so safety is relevant.

Thanks for any helpful comments and opinions!

Best, R. Kupper

________

Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://meet.google.com/cpu-eiue-hvk
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
https://lists.gluster.org/mailman/listinfo/gluster-users

________

Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://meet.google.com/cpu-eiue-hvk
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
https://lists.gluster.org/mailman/listinfo/gluster-users