Strange behaviour with add-brick followed by remove-brick

lukas.bezdicka at gooddata.com (Lukáš Bezdička) · Wed, 30 Oct 2013 16:46:36 +0100

remove-brick on distribute does not work for me:
https://bugzilla.redhat.com/show_bug.cgi?id=1024369

On Wed, Oct 30, 2013 at 4:40 PM, Brian Cipriano <bcipriano at zerovfx.com>wrote:

> I had the exact same experience recently with a 3.4 distributed cluster I
> set up. I spent some time on the IRC but couldn?t track it down. Seems
> remove-brick is broken in 3.3 and 3.4. I guess folks don?t remove bricks
> very often :)
>
> - brian
>
>
>
>
>
> On Oct 30, 2013, at 11:21 AM, Lalatendu Mohanty <lmohanty at redhat.com>
> wrote:
>
>  On 10/30/2013 08:40 PM, Lalatendu Mohanty wrote:
>
> On 10/30/2013 03:43 PM, B.K.Raghuram wrote:
>
> I have gluster 3.4.1 on 4 boxes with hostnames n9, n10, n11, n12. I
> did the following sequence of steps and ended up with losing data so
> what did I do wrong?!
>
> - Create a distributed volume with bricks on n9 and n10
> - Started the volume
> - NFS mounted the volume and created 100 files on it. Found that n9
> had 45, n10 had 55
> - Added a brick n11 to this volume
> - Removed a brick n10 from the volume with gluster remove brick <vol>
> <n10 brick name> start
> - n9 now has 45 files, n10 has 55 files and n11 has 45 files(all the
> same as on n9)
> - Checked status, it shows that no rebalanced files but that n10 had
> scanned 100 files and completed. 0 scanned for all the others
> - I then did a rebalance start force on the vol and found that n9 had
> 0 files, n10 had 55 files and n11 had 45 files - weird - looked like
> n9 had been removed but double checked again and found that n10 had
> indeed been removed.
> - did a remove-brick commit. Now same file distribution after that.
> volume info now shows the volume to have n9 and n11 and bricks.
> - did a rebalance start again on the volume. The rebalance-status now
> shows n11 had 45 rebalanced files, all the brick nodes had 45 files
> scanned and all show complete. The file layout after this is n9 has 45
> files and n10 has 55 files. n11 has 0 files!
> - An ls on the nfs mount now shows only 45 files so the other 55 not
> visible because they are on n10 which is not part of the volume!
>
> What have I done wrong in this sequence?
> _______________________________________________
> Gluster-users mailing listGluster-users at gluster.orghttp://supercolony.gluster.org/mailman/listinfo/gluster-users
>
>
> I think running rebalnce (force) in between "remove brick start" and
> "remove brick commit" is the issue. Can you please paste your command as
> per the time line of events. That would make it more clear.
>
> Below are the steps, I do to replace a brick and it works for me.
>
>
>    1. gluster volume add-brick *VOLNAME NEW-BRICK*
>    2. gluster volume remove-brick VOLNAME* BRICK* start
>    3. gluster volume remove-brick VOLNAME* BRICK* status
>    4. gluster volume remove-brick VOLNAME *BRICK* commit
>
>  I will also suggest you to use distribute-replicate volumes, so that you
> have a replica copy always and it reduces the probability of losing data.
>
> -Lala
>
>  _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://supercolony.gluster.org/mailman/listinfo/gluster-users
>
>
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://supercolony.gluster.org/mailman/listinfo/gluster-users
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20131030/5f4a2d6a/attachment.html>