On Fri, Aug 12, 2011 at 4:49 PM, Rajat Chopra <rchopra at redhat.com> wrote: > > I am afraid the 'replace-brick' procedure does not work well if the node is dead. Here is the (long-ish) step-wise procedure for the dead-end that I run into... > > ?[node-1 $] service glusterd start > ?[node-1 $] gluster volume create my-vol replica 2 node-1:/srv-node-1-first node-1:/srv-node-1-second > ?[node-1 $] gluster volume start my-vol > # this began my gluster service on first node with two bricks replicated but sourcing from the same node > # next I add a new node and replace one of the bricks with a new brick location on second node > # the purpose is to achieve failover redundancy > ?[node-2 $] service glusterd start > ?[node-1 $] gluster peer probe node-2 > ?[node-2 $] gluster peer probe node-1 > ?[node-2 $] gluster volume replace-brick my-vol node-1:/srv-node-1-second node-2:/srv-node-2-third start > # this starts the replace operation and after a while I can do volume info from either node > ?[node-2 $] gluster volume info > Volume Name: my-vol > Type: Replicate > Status: Started > Number of Bricks: 2 > Transport-type: tcp > Bricks: > Brick1: node-1:/srv-node-1-first > Brick2: node-2:/srv-node-2-third > > # all good so far... now node-1 dies (no EBS, no disk, no data... just not reachable.. its a pvt cloud and the machine running the vm had a hardware failure) > # good gluster serves well from node-2 to all the clients nicely too > # now I want to replace the node-1 brick to another brick in node-2 so that I can pass it on to new nodes later > > # so according to the suggestion, I ran replace-brick command > [node-2 $] gluster volume replace-brick my-vol node-1:/srv-node-1-first node-2:/srv-node-2-fourth start Did you run $ gluster volume replace-brick my-vol node-1:/srv-node-1-first node-2:/srv-node-2-fourth commit ? If not, try running this additional command commit. This will make necessary changes to the config. So don't check status but run commit right after because dead node is not around. > # the command succeeds without errors, so I check status... > [node-2 $] gluster volume replace-brick my-vol node-1:/srv-node-1-first node-2:/srv-node-2-fourth status > # this command is supposed to return the status, but it returns nothing > # I check with gluster volume info on node-2 > [node-2 $] gluster volume info > No volumes present > # wha?? where did my volume go? > # Note that all this while.. my mounted client is working fine, so no downtime > > Since 'gluster volume info' returned with 'No volumes present', I assume that the procedure does not work. Is there something wrong in my procedure, or was it not supposed to work anyway? > > I am using v3.1.1 > > Again, I really appreciate the help, but I seem to be stuck. The email suggested the procedure in the following link > ? ? ?[ http://gluster.com/community/documentation/index.php/Gluster_3.2:_Brick_Restoration_-_Replace_Crashed_Server ] > It seems like a better way of replacing dead-nodes, but then it would seem that I cant replace the brick from dead node to a newly created path on an existing node, because I should have the hostname matched. That is fine too, if its a requirement, but can I assume that it will work with 3.1.1 or do I have to upgrade to 3.2 for it? > > Thanks again for the assistance. > Rajat > > > > > > ----- Original Message ----- > From: "Mohit Anchlia" <mohitanchlia at gmail.com> > To: "Rajat Chopra" <rchopra at redhat.com> > Cc: "Harshavardhana" <harsha at gluster.com>, gluster-users at gluster.org > Sent: Friday, August 12, 2011 3:07:59 PM > Subject: Re: Replace brick of a dead node > > On Fri, Aug 12, 2011 at 2:35 PM, Rajat Chopra <rchopra at redhat.com> wrote: >> >> Thank you Harsha for the quick response. >> >> Unfortunately, the infrastructure is in the cloud. So, I cant get the dead node's disk. >> Since I have replication 'ON', there is no downtime as the brick on the second node serves well, but I want the redundancy/replication to be restored with the introduction of a new node (#3) in the cluster. > > One way is http://gluster.com/community/documentation/index.php/Gluster_3.2:_Brick_Restoration_-_Replace_Crashed_Server > > Other way is to use replace-brick. You should be able to use it even > if the node is dead. > >> >> I would hope there is a gluster command to just forget about the dead node's brick, and pick up the new brick and start replicating/serving from the new location (in conjunction with the one existing brick on the #2 node). Is that the self heal feature? I am using v3.11 as of now. >> >> Rajat >> >> >> >> >> ----- Original Message ----- >> From: "Harshavardhana" <harsha at gluster.com> >> To: "Rajat Chopra" <rchopra at redhat.com> >> Cc: gluster-users at gluster.org >> Sent: Friday, August 12, 2011 2:06:14 PM >> Subject: Re: Replace brick of a dead node >> >>> I have a two node cluster, with two bricks replicated, one on each node. >>> Lets say one of the node dies and is unreachable. >> >> If you have the disk from the dead node, then all have to do is plug >> it in new system and start running following commands. >> >> gluster volume replace-brick <volname> <old-brick> <new-brick> start >> gluster volume replace-brick <volname> <old-brick> <new-brick> commit >> >> You don't have to migrate the data, this works as expected. >> >> Since you have a replicate you wouldn't see a downtime, ?but mind you >> self-heal will kick in as of 3.2 it will be blocking, wait for 3.3 you >> have non-blocking self-healing capabilities. >> >>> I want to be able to spin a new node and replace the dead node's brick to a location on the new node. >> >> This is out of Gluster's hand, if you already have mechanisms to >> decommission a brick and reattach it on new node then above steps are >> fairly simple. >> >> Go ahead and try it, it should work. >> >> -Harsha >> _______________________________________________ >> Gluster-users mailing list >> Gluster-users at gluster.org >> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users >> >