Re: Recovering lost node in dispersed volume

Serkan Çoban <cobanserkan@xxxxxxxxx> · Thu, 22 Sep 2016 18:05:00 +0300

Here are the steps for replacing a failed node:

1- In one of the other servers run "grep thaila
/var/lib/glusterd/peers/* | cut -d: -f1 | cut -d/ -f6" and note the
UUID
2- stop glusterd on failed server and add "UUID=uuid_from_previous
step" to /var/lib/glusterd/glusterd.info and start glusterd
3- run "gluster peer probe calliope"
4- restart glusterd
5- now gluster peer status should show all the peers. if not probe
them manually as above.
6-for all the bricks run the command "setfattr -n
trusted.glusterfs.volume-id -v 0x$(grep volume-id
/var/lib/glusterd/vols/vol_name/info | cut -d= -f2 | sed 's/-//g')
brick_name"
7 restart glusterd and everythimg should be fine.

I think I read the steps from this link:
https://support.rackspace.com/how-to/recover-from-a-failed-server-in-a-glusterfs-array/
Look to the "keep the ip address" part.

On Thu, Sep 22, 2016 at 5:16 PM, Tony Schreiner
<anthony.schreiner@xxxxxx> wrote:
> I set uo a dispersed volume with 1 x (3 + 1) nodes ( i do know that 3+1 is
> not optimal).
> Originally created in version 3.7 but recently upgraded without issue to
> 3.8.
>
> # gluster vol info
> Volume Name: rvol
> Type: Disperse
> Volume ID: e8f15248-d9de-458e-9896-f1a5782dcf74
> Status: Started
> Snapshot Count: 0
> Number of Bricks: 1 x (3 + 1) = 4
> Transport-type: tcp
> Bricks:
> Brick1: calliope:/brick/p1
> Brick2: euterpe:/brick/p1
> Brick3: lemans:/brick/p1
> Brick4: thalia:/brick/p1
> Options Reconfigured:
> performance.readdir-ahead: on
> nfs.disable: off
>
> I inadvertently allowed one of the nodes (thalia) to be reinstalled; which
> overwrote the system, but not the brick, and I need guidance in getting it
> back into the volume.
>
> (on lemans)
> gluster peer status
> Number of Peers: 3
>
> Hostname: calliope
> Uuid: 72373eb1-8047-405a-a094-891e559755da
> State: Peer in Cluster (Connected)
>
> Hostname: euterpe
> Uuid: 9fafa5c4-1541-4aa0-9ea2-923a756cadbb
> State: Peer in Cluster (Connected)
>
> Hostname: thalia
> Uuid: 843169fa-3937-42de-8fda-9819efc75fe8
> State: Peer Rejected (Connected)
>
> the thalia peer is rejected. If I try to peer probe thalia I am told it
> already part of the pool. If from thalia, I try to peer probe one of the
> others, I am told that they are already part of another pool.
>
> I have tried removing the thalia brick with
> gluster vol remove-brick rvol thalia:/brick/p1 start
> but get the error
> volume remove-brick start: failed: Remove brick incorrect brick count of 1
> for disperse 4
>
> I am not finding much guidance for this particular situation. I could use a
> suggestion on how to recover. It's a lab situation so no biggie if I lose
> it.
> Cheers
>
> Tony Schreiner
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users@xxxxxxxxxxx
> http://www.gluster.org/mailman/listinfo/gluster-users
_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users