I am afraid the 'replace-brick' procedure does not work well if the node is dead. Here is the (long-ish) step-wise procedure for the dead-end that I run into... [node-1 $] service glusterd start [node-1 $] gluster volume create my-vol replica 2 node-1:/srv-node-1-first node-1:/srv-node-1-second [node-1 $] gluster volume start my-vol # this began my gluster service on first node with two bricks replicated but sourcing from the same node # next I add a new node and replace one of the bricks with a new brick location on second node # the purpose is to achieve failover redundancy [node-2 $] service glusterd start [node-1 $] gluster peer probe node-2 [node-2 $] gluster peer probe node-1 [node-2 $] gluster volume replace-brick my-vol node-1:/srv-node-1-second node-2:/srv-node-2-third start # this starts the replace operation and after a while I can do volume info from either node [node-2 $] gluster volume info Volume Name: my-vol Type: Replicate Status: Started Number of Bricks: 2 Transport-type: tcp Bricks: Brick1: node-1:/srv-node-1-first Brick2: node-2:/srv-node-2-third # all good so far... now node-1 dies (no EBS, no disk, no data... just not reachable.. its a pvt cloud and the machine running the vm had a hardware failure) # good gluster serves well from node-2 to all the clients nicely too # now I want to replace the node-1 brick to another brick in node-2 so that I can pass it on to new nodes later # so according to the suggestion, I ran replace-brick command [node-2 $] gluster volume replace-brick my-vol node-1:/srv-node-1-first node-2:/srv-node-2-fourth start # the command succeeds without errors, so I check status... [node-2 $] gluster volume replace-brick my-vol node-1:/srv-node-1-first node-2:/srv-node-2-fourth status # this command is supposed to return the status, but it returns nothing # I check with gluster volume info on node-2 [node-2 $] gluster volume info No volumes present # wha?? where did my volume go? # Note that all this while.. my mounted client is working fine, so no downtime Since 'gluster volume info' returned with 'No volumes present', I assume that the procedure does not work. Is there something wrong in my procedure, or was it not supposed to work anyway? I am using v3.1.1 Again, I really appreciate the help, but I seem to be stuck. The email suggested the procedure in the following link [ http://gluster.com/community/documentation/index.php/Gluster_3.2:_Brick_Restoration_-_Replace_Crashed_Server ] It seems like a better way of replacing dead-nodes, but then it would seem that I cant replace the brick from dead node to a newly created path on an existing node, because I should have the hostname matched. That is fine too, if its a requirement, but can I assume that it will work with 3.1.1 or do I have to upgrade to 3.2 for it? Thanks again for the assistance. Rajat ----- Original Message ----- From: "Mohit Anchlia" <mohitanchlia at gmail.com> To: "Rajat Chopra" <rchopra at redhat.com> Cc: "Harshavardhana" <harsha at gluster.com>, gluster-users at gluster.org Sent: Friday, August 12, 2011 3:07:59 PM Subject: Re: Replace brick of a dead node On Fri, Aug 12, 2011 at 2:35 PM, Rajat Chopra <rchopra at redhat.com> wrote: > > Thank you Harsha for the quick response. > > Unfortunately, the infrastructure is in the cloud. So, I cant get the dead node's disk. > Since I have replication 'ON', there is no downtime as the brick on the second node serves well, but I want the redundancy/replication to be restored with the introduction of a new node (#3) in the cluster. One way is http://gluster.com/community/documentation/index.php/Gluster_3.2:_Brick_Restoration_-_Replace_Crashed_Server Other way is to use replace-brick. You should be able to use it even if the node is dead. > > I would hope there is a gluster command to just forget about the dead node's brick, and pick up the new brick and start replicating/serving from the new location (in conjunction with the one existing brick on the #2 node). Is that the self heal feature? I am using v3.11 as of now. > > Rajat > > > > > ----- Original Message ----- > From: "Harshavardhana" <harsha at gluster.com> > To: "Rajat Chopra" <rchopra at redhat.com> > Cc: gluster-users at gluster.org > Sent: Friday, August 12, 2011 2:06:14 PM > Subject: Re: Replace brick of a dead node > >> I have a two node cluster, with two bricks replicated, one on each node. >> Lets say one of the node dies and is unreachable. > > If you have the disk from the dead node, then all have to do is plug > it in new system and start running following commands. > > gluster volume replace-brick <volname> <old-brick> <new-brick> start > gluster volume replace-brick <volname> <old-brick> <new-brick> commit > > You don't have to migrate the data, this works as expected. > > Since you have a replicate you wouldn't see a downtime, ?but mind you > self-heal will kick in as of 3.2 it will be blocking, wait for 3.3 you > have non-blocking self-healing capabilities. > >> I want to be able to spin a new node and replace the dead node's brick to a location on the new node. > > This is out of Gluster's hand, if you already have mechanisms to > decommission a brick and reattach it on new node then above steps are > fairly simple. > > Go ahead and try it, it should work. > > -Harsha > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > http://gluster.org/cgi-bin/mailman/listinfo/gluster-users >