Strange behaviour with add-brick followed by remove-brick

lmohanty at redhat.com (Lalatendu Mohanty) · Wed, 30 Oct 2013 20:51:20 +0530



On 10/30/2013 08:40 PM, Lalatendu Mohanty wrote:
> On 10/30/2013 03:43 PM, B.K.Raghuram wrote:
>> I have gluster 3.4.1 on 4 boxes with hostnames n9, n10, n11, n12. I
>> did the following sequence of steps and ended up with losing data so
>> what did I do wrong?!
>>
>> - Create a distributed volume with bricks on n9 and n10
>> - Started the volume
>> - NFS mounted the volume and created 100 files on it. Found that n9
>> had 45, n10 had 55
>> - Added a brick n11 to this volume
>> - Removed a brick n10 from the volume with gluster remove brick <vol>
>> <n10 brick name> start
>> - n9 now has 45 files, n10 has 55 files and n11 has 45 files(all the
>> same as on n9)
>> - Checked status, it shows that no rebalanced files but that n10 had
>> scanned 100 files and completed. 0 scanned for all the others
>> - I then did a rebalance start force on the vol and found that n9 had
>> 0 files, n10 had 55 files and n11 had 45 files - weird - looked like
>> n9 had been removed but double checked again and found that n10 had
>> indeed been removed.
>> - did a remove-brick commit. Now same file distribution after that.
>> volume info now shows the volume to have n9 and n11 and bricks.
>> - did a rebalance start again on the volume. The rebalance-status now
>> shows n11 had 45 rebalanced files, all the brick nodes had 45 files
>> scanned and all show complete. The file layout after this is n9 has 45
>> files and n10 has 55 files. n11 has 0 files!
>> - An ls on the nfs mount now shows only 45 files so the other 55 not
>> visible because they are on n10 which is not part of the volume!
>>
>> What have I done wrong in this sequence?
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> http://supercolony.gluster.org/mailman/listinfo/gluster-users
> |
> I think running rebalnce (force) in between "remove brick start" and 
> "remove brick commit" is the issue. Can you please paste your command 
> as per the time line of events. That would make it more clear.
>
> Below are the steps, I do to replace a brick and it works for me.
>
> |
>
>  1. |gluster volume add-brick /|VOLNAME NEW-BRICK|/|
>  2. |gluster volume remove-brick |VOLNAME|/|BRICK|/| |start|
>  3. |gluster volume remove-brick |VOLNAME|/|BRICK|/||status|
>  4. |gluster volume remove-brick |VOLNAME /BRICK/| commit|
>
I will also suggest you to use distribute-replicate volumes, so that you 
have a replica copy always and it reduces the probability of losing data.

-Lala

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20131030/2e0af43b/attachment.html>