Replacing a node in a 4x2 distributed/replicated setup

Thomas Bätzler <t.baetzler@xxxxxxxxxx> · Fri, 30 Oct 2015 12:26:33 +0100

Hi,

can somebody help me with fixing our 8 node gluster please?

Setup is as follows:

root@glucfshead2:~# gluster volume info

Volume Name: archive
Type: Distributed-Replicate
Volume ID: d888b302-2a35-4559-9bb0-4e182f49f9c6
Status: Started
Number of Bricks: 4 x 2 = 8
Transport-type: tcp
Bricks:
Brick1: glucfshead1:/data/glusterfs/archive/brick1
Brick2: glucfshead5:/data/glusterfs/archive/brick1
Brick3: glucfshead2:/data/glusterfs/archive/brick1
Brick4: glucfshead6:/data/glusterfs/archive/brick1
Brick5: glucfshead3:/data/glusterfs/archive/brick1
Brick6: glucfshead7:/data/glusterfs/archive/brick1
Brick7: glucfshead4:/data/glusterfs/archive/brick1
Brick8: glucfshead8:/data/glusterfs/archive/brick1
Options Reconfigured:
cluster.data-self-heal: off
cluster.entry-self-heal: off
cluster.metadata-self-heal: off
features.lock-heal: on
cluster.readdir-optimize: on
performance.flush-behind: off
performance.io-thread-count: 16
features.quota: off
performance.quick-read: on
performance.stat-prefetch: off
performance.io-cache: on
performance.cache-refresh-timeout: 1
nfs.disable: on
performance.cache-max-file-size: 200kb
performance.cache-size: 2GB
performance.write-behind-window-size: 4MB
performance.read-ahead: off
storage.linux-aio: off
diagnostics.brick-sys-log-level: WARNING
cluster.self-heal-daemon: off

Volume Name: archive2
Type: Distributed-Replicate
Volume ID: 0fe86e42-e67f-46d8-8ed0-d0e34f539d69
Status: Started
Number of Bricks: 4 x 2 = 8
Transport-type: tcp
Bricks:
Brick1: glucfshead1:/data/glusterfs/archive2/brick1
Brick2: glucfshead5:/data/glusterfs/archive2/brick1
Brick3: glucfshead2:/data/glusterfs/archive2/brick1
Brick4: glucfshead6:/data/glusterfs/archive2/brick1
Brick5: glucfshead3:/data/glusterfs/archive2/brick1
Brick6: glucfshead7:/data/glusterfs/archive2/brick1
Brick7: glucfshead4:/data/glusterfs/archive2/brick1
Brick8: glucfshead8:/data/glusterfs/archive2/brick1
Options Reconfigured:
cluster.metadata-self-heal: off
cluster.entry-self-heal: off
cluster.data-self-heal: off
diagnostics.count-fop-hits: on
diagnostics.latency-measurement: on
features.lock-heal: on
diagnostics.brick-sys-log-level: WARNING
storage.linux-aio: off
performance.read-ahead: off
performance.write-behind-window-size: 4MB
performance.cache-size: 2GB
performance.cache-max-file-size: 200kb
nfs.disable: on
performance.cache-refresh-timeout: 1
performance.io-cache: on
performance.stat-prefetch: off
performance.quick-read: on
features.quota: off
performance.io-thread-count: 16
performance.flush-behind: off
auth.allow: 172.16.15.*
cluster.readdir-optimize: on
cluster.self-heal-daemon: off

Some time ago node, glucfshead1 broke down. After some fiddling it was
decided not to deal with that immediately because the gluster was in
production and a rebuild on 3.4 would basically render the gluster unusable.

Recently it was felt that we needed to deal with the situation and we
hired some experts to deal with the problem. So we reinstalled the
broken node and gave it a new name/ip and upgraded all systems to 3.6.4.

The plan was to probe the "new" node into the gluster and then do a
brick-replace on it. However that did not go as expected.

The node that we removed is now listed as "Peer Rejected":

root@glucfshead2:~# gluster peer status
Number of Peers: 7

Hostname: glucfshead1
Uuid: 09ed9a29-c923-4dc5-957a-e0d3e8032daf
State: Peer Rejected (Disconnected)

Hostname: glucfshead3
Uuid: a17ae95d-4598-4cd7-9ae7-808af10fedb5
State: Peer in Cluster (Connected)

Hostname: glucfshead4
Uuid: 8547dadd-96bf-45fe-b49d-bab8f995c928
State: Peer in Cluster (Connected)

Hostname: glucfshead5
Uuid: 249da8ea-fda6-47ff-98e0-dbff99dcb3f2
State: Peer in Cluster (Connected)

Hostname: glucfshead6
Uuid: a0229511-978c-4904-87ae-7e1b32ac2c72
State: Peer in Cluster (Connected)

Hostname: glucfshead7
Uuid: 548ec75a-0131-4c92-aaa9-7c6ee7b47a63
State: Peer in Cluster (Connected)

Hostname: glucfshead8
Uuid: 5e54cbc1-482c-460b-ac38-00c4b71c50b9
State: Peer in Cluster (Connected)

If I probe the replacement node (glucfshead9) it only ever shows up on
one of my running nodes and it's in state "Rejected Peer (Connected)".

How can we fix this - preferably without losing data?

TIA,
Thomas

---
Diese E-Mail wurde von Avast Antivirus-Software auf Viren geprüft.
https://www.avast.com/antivirus

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users