Re: [EXT] Replace broken host, keeping the existing bricks

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Thank you Stefan for your comments.


Am 11. Juni 2024 um 13:51 schrieb "Stefan Solbrig" <stefan.solbrig@xxxxx>:


> 
> Hi,
> 
>  
> 
>  The method depends a bit if you use a distributed-only system (like me) or a replicated setting.
> 
>  I'm using a distributed-only setting (many bricks on different servers, but no replication).  All my servers boot via network, i.e., on a start, it's like a new host.
> 

We have a distributed-replicated setup with 6 hosts: distribute 2, replicate 3. Each host has 4 bricks.

Specifically:

- Data is distributed across hosts “gluster1” and “gluster2” (using two volumes, each volume consists of 2 bricks, residing on separate disks. So every host has 4 bricks, residing on 4 separate disks. These are all undamaged and can be reused.)
- This setup is replicated on hosts “gluster3/gluster4” and “gluster5/gluster6”.
- gluster1 is the machine with the broken root disk (all bricks undamaged).


> To rescue the old bricks, just set up a new server this the same OS, the same IP and and the same hostname (!very important).  The simplest thing would be if you could retrieve the files in /var/lib/glusterd 

No data can be retrieved from the original root disk of gluster1, we tried any magic we could summon.

 
>  If you install a completely new server (but with tsame IP and the same hostname), _then_ restore the files in /var/lib/glusterd,  you can just use it as before. It will be recognised as the previous peer, without any additional commands.

This sounds the way to go then.


>  In fact, I think that /var/lib/glusterd/*...   should be identical on all servers, except
>  /var/lib/glusterd/glusterd.info http://glusterd.info/  
>  which holds the UUID of the server. However, you should be able to retrieve the UUID from the command:
>  gluster pool list

I believe so. Having 5 working nodes we should be fine retrieving the configuration data.


>  This is your scenario 2a)
> 
>  Note that if it's __not__ a distributed-only system, other steps might be necessary.

This worries me a bit. What other steps do you mean?



>  Your 2b) scenario should also work, but slightly different. (Again, only distributed-only)
> 
>  I use it occasionally for failover mode, but I haven't tested it extensively:
> 
>  
> 
>      gluster v reset-brick NameOfVolume FailedServer:/path/to/brick   start
> 
>  
> 
>      gluster v add-brick NameOfVolume NewServer:/path/to/brick  force
> 
>  
> 
>      # Order is important!
> 
>      # if brick is removed before other brick is added,
> 
>      # will lead to duplicate files.
> 
>  
> 
>      gluster v remove-brick NameOfVolume FailedServer:/path/to/brick force
> 
>  
> 
>      gluster v rebalance NameOfVolume fix-layout  start
> 
>  
> 
>  If it's also replcated or striped or using sharding, then other steps might be necessary.

See above: What steps do you think of?


>  best wishes,
> 
>  Stefan Solbrig


Best regards to Regensburg,
R. Kupper


> 
>  
> 
>  -- 
> 
>  
> 
>  Dr. Stefan Solbrig
> 
>  
> 
>  Universität Regensburg
> 
>  
> 
>  Fakultät für Informatik und Data Science
> 
>  
> 
>  93040 Regensburg
> 
>  
> 
>  
> 
>  Am 09.06.2024 um 15:00 schrieb kup@xxxxxxxxx:
> 
>  
> 
>  Hi all,
> 
>  
> 
>  I know there are many tutorials on how to replace a gluster host that has become unusable. But they all seem to assume that the bricks of the respective host are gone, too.
> 
>  
> 
>  My problem is different and (I hope) more easily solved: the disk with the host’s root file system died and cannot be recovered. However, all of its bricks are on separate disks and completely undamaged.
> 
>  
> 
>  I‘m seeking your advice on what is best practice for replacing such a host.
> 
>  
> 
>  My notion is that it should be possible to setup a new root system, configure it and have it use the existing bricks.
> 
>  
> 
>  My questions are:
> 
>  
> 
>  1) Is this a good idea at all or do I miss anything? Would it be better to format the existing bricks and start over with a completely clean new host, like most of the tutorials do?
> 
>  
> 
>  2) If it is feasible to use the existing bricks, two scenarios come to my mind:
> 
>  
> 
>   a) Setup a new root file system for a gluster host and copy/change gluster configuration from one of the existing hosts. Adjust it so that the newly setup host actually thinks it is the old host (that died). I.e., copying over the gluster UID, Volume configurations, Hostnamen, IP, etc. (What else would it need?)
> 
>  
> 
>      The pool would then recognize the new host as identical to the old one that died and accept it just like the old host came online again. 
> 
>  
> 
>   b) Setup a new root file system for a gluster host and probe it into the trusted pool, with a new name and new gluster UID. Transfer bricks of the old host that died to the new one using „change-brick“. There would be no need for lengthy syncing as most of the data is existing and up-to-date on the new host (that has the bricks of the old host), only self-heal would take place.
> 
>  
> 
>  Do these scenarios sound sane to you and which one would be best practice in this situation? This is a production system, so safety is relevant.
> 
>  
> 
>  Thanks for any helpful comments and opinions!
> 
>  
> 
>  Best, R. Kupper
> 
>  
> 
>  ________
> 
>  
> 
>  Community Meeting Calendar:
> 
>  
> 
>  Schedule -
> 
>  
> 
>  Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
> 
>  
> 
>  Bridge: https://meet.google.com/cpu-eiue-hvk
> 
>  
> 
>  Gluster-users mailing list
> 
>  
> 
>  Gluster-users@xxxxxxxxxxx
> 
>  
> 
>  https://lists.gluster.org/mailman/listinfo/gluster-users
>

________



Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://meet.google.com/cpu-eiue-hvk
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
https://lists.gluster.org/mailman/listinfo/gluster-users




[Index of Archives]     [Gluster Development]     [Linux Filesytems Development]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux