Hi, Atin Mukherjee wrote: > This could very well be related to op-version. Could you look at the > faulty node's glusterd log and see the error log entries, that would > give us the exact reason of failure. op-version is 1 across all the nodes. I've made some progress: by persistently wiping /var/lib/glusterd except fpr glusterd.info and restarting glusterd on the new node, I've progressed to a state where all nodes agree that my replacement node is part of the gluster: root@glucfshead2:~# for i in `seq 2 9`; do echo "glucfshead$i:"; ssh glucfshead$i "gluster peer status" | grep -A2 glucfshead9 ; done glucfshead2: Hostname: glucfshead9 Uuid: 040e61dd-fd02-4957-8833-cf5708b837f0 State: Peer in Cluster (Connected) glucfshead3: Hostname: glucfshead9 Uuid: 040e61dd-fd02-4957-8833-cf5708b837f0 State: Peer in Cluster (Connected) glucfshead4: Hostname: glucfshead9 Uuid: 040e61dd-fd02-4957-8833-cf5708b837f0 State: Peer in Cluster (Connected) glucfshead5: Hostname: glucfshead9 Uuid: 040e61dd-fd02-4957-8833-cf5708b837f0 State: Peer in Cluster (Connected) glucfshead6: Hostname: glucfshead9 Uuid: 040e61dd-fd02-4957-8833-cf5708b837f0 State: Peer in Cluster (Connected) glucfshead7: Hostname: glucfshead9 Uuid: 040e61dd-fd02-4957-8833-cf5708b837f0 State: Peer in Cluster (Connected) glucfshead8: Hostname: glucfshead9 Uuid: 040e61dd-fd02-4957-8833-cf5708b837f0 State: Peer in Cluster (Connected) The new node sees all of the other nodes: root@glucfshead9:~# gluster peer status Number of Peers: 7 Hostname: glucfshead4.bo.rz.pixum.net Uuid: 8547dadd-96bf-45fe-b49d-bab8f995c928 State: Peer in Cluster (Connected) Hostname: glucfshead2 Uuid: 73596f88-13ae-47d7-ba05-da7c347f6141 State: Peer in Cluster (Connected) Hostname: glucfshead3 Uuid: a17ae95d-4598-4cd7-9ae7-808af10fedb5 State: Peer in Cluster (Connected) Hostname: glucfshead5.bo.rz.pixum.net Uuid: 249da8ea-fda6-47ff-98e0-dbff99dcb3f2 State: Peer in Cluster (Connected) Hostname: glucfshead6 Uuid: a0229511-978c-4904-87ae-7e1b32ac2c72 State: Peer in Cluster (Connected) Hostname: glucfshead7 Uuid: 548ec75a-0131-4c92-aaa9-7c6ee7b47a63 State: Peer in Cluster (Connected) Hostname: glucfshead8 Uuid: 5e54cbc1-482c-460b-ac38-00c4b71c50b9 State: Peer in Cluster (Connected) The old nodes all agree that the to-be-replaced node is offline: root@glucfshead2:~# for i in `seq 2 9`; do echo "glucfshead$i:"; ssh glucfshead$i "gluster peer status" | grep -B2 Rej ; done glucfshead2: Hostname: glucfshead1 Uuid: 09ed9a29-c923-4dc5-957a-e0d3e8032daf State: Peer Rejected (Disconnected) glucfshead3: Hostname: glucfshead1 Uuid: 09ed9a29-c923-4dc5-957a-e0d3e8032daf State: Peer Rejected (Disconnected) glucfshead4: Hostname: glucfshead1 Uuid: 09ed9a29-c923-4dc5-957a-e0d3e8032daf State: Peer Rejected (Disconnected) glucfshead5: Hostname: glucfshead1 Uuid: 09ed9a29-c923-4dc5-957a-e0d3e8032daf State: Peer Rejected (Disconnected) glucfshead6: Hostname: glucfshead1 Uuid: 09ed9a29-c923-4dc5-957a-e0d3e8032daf State: Peer Rejected (Disconnected) glucfshead7: Hostname: glucfshead1 Uuid: 09ed9a29-c923-4dc5-957a-e0d3e8032daf State: Peer Rejected (Disconnected) glucfshead8: Hostname: glucfshead1 Uuid: 09ed9a29-c923-4dc5-957a-e0d3e8032daf State: Peer Rejected (Disconnected) glucfshead9: If I try to replace the downed brick with my new brick it says it's successful: root@glucfshead2:~# gluster volume replace-brick archive glucfshead1:/data/glusterfs/archive/brick1 glucfshead9:/data/glusterfs/archive/brick1/brick commit force volume replace-brick: success: replace-brick commit successful However on checking the broken brick is still show as online: root@glucfshead2:~# gluster volume info Volume Name: archive Type: Distributed-Replicate Volume ID: d888b302-2a35-4559-9bb0-4e182f49f9c6 Status: Started Number of Bricks: 4 x 2 = 8 Transport-type: tcp Bricks: Brick1: glucfshead1:/data/glusterfs/archive/brick1 Brick2: glucfshead5:/data/glusterfs/archive/brick1 Brick3: glucfshead2:/data/glusterfs/archive/brick1 Brick4: glucfshead6:/data/glusterfs/archive/brick1 Brick5: glucfshead3:/data/glusterfs/archive/brick1 Brick6: glucfshead7:/data/glusterfs/archive/brick1 Brick7: glucfshead4:/data/glusterfs/archive/brick1 Brick8: glucfshead8:/data/glusterfs/archive/brick1 Options Reconfigured: cluster.data-self-heal: off cluster.entry-self-heal: off cluster.metadata-self-heal: off features.lock-heal: on cluster.readdir-optimize: on auth.allow: 172.16.15.* performance.flush-behind: off performance.io-thread-count: 16 features.quota: off performance.quick-read: on performance.stat-prefetch: off performance.io-cache: on performance.cache-refresh-timeout: 1 nfs.disable: on performance.cache-max-file-size: 200kb performance.cache-size: 2GB performance.write-behind-window-size: 4MB performance.read-ahead: off storage.linux-aio: off diagnostics.brick-sys-log-level: INFO server.statedump-path: /var/tmp cluster.self-heal-daemon: off All of the old bricks complain loudly that they can't connect to glucfshead1: [2015-11-03 13:54:59.422135] I [MSGID: 106004] [glusterd-handler.c:4398:__glusterd_peer_rpc_notify] 0-management: Peer 09ed9a29-c923-4dc5-957a-e0d3e8032daf, in Peer Rejected state, has disconnected from glusterd. [2015-11-03 13:56:24.996215] I [glusterd-replace-brick.c:99:__glusterd_handle_replace_brick] 0-management: Received replace brick req [2015-11-03 13:56:24.996283] I [glusterd-replace-brick.c:154:__glusterd_handle_replace_brick] 0-management: Received replace brick commit-force request [2015-11-03 13:56:25.016345] E [glusterd-rpc-ops.c:1087:__glusterd_stage_op_cbk] 0-management: Received stage RJT from uuid: 040e61dd-fd02-4957-8833-cf5708b837f0 The new server only logs "Stage failed". [2015-11-03 13:56:25.015942] E [glusterd-op-sm.c:4585:glusterd_op_ac_stage_op] 0-management: Stage failed on operation 'Volume Replace brick', Status : -1 I tried to detach glucfshead1 since it's no longer online, but I only get a message that I can't do it since that server is still part of a volume. Any further ideas ideas that I could try? TIA, Thomas --- Diese E-Mail wurde von Avast Antivirus-Software auf Viren geprüft. https://www.avast.com/antivirus _______________________________________________ Gluster-users mailing list Gluster-users@xxxxxxxxxxx http://www.gluster.org/mailman/listinfo/gluster-users