[Gluster-devel] Phasing out replace-brick for data migration in favor of remove-brick.

coolbsd at hotmail.com (Cool) · Fri, 27 Sep 2013 11:33:12 -0700

How does the new command set achieve this?

old layout (2x2):
rep=2: h1:/b1 h2:/b1 h1:/b2 h2:/b2

new layout (3x2):
rep=2: h1:/b1 h2:/b1 h1:/b2 h3:/b1 h2:/b2 h3:/b2

purpose for the new layout is to make sure there is no SOF, as I cannot 
simple add h3:/b1 and h3:/b2 as a pair.

With replace-brick it pretty straightforward, but without that ... 
should I remove-brick h2:/b2 then add-brick h3:/b1? this means I'm going 
to have only one copy for some data for a certain period of time, which 
makes me feel nervous. Or, should I add-brick h3:/b1 first? That doesn't 
seems to be reasonable either.

Or am I the only one hitting this kind of upgrade?

-C.B.

On 9/27/2013 10:15 AM, Amar Tumballi wrote:
>
>     Hello all,
>     DHT's remove-brick + rebalance has been enhanced in the last
>     couple of releases to be quite sophisticated. It can handle
>     graceful decommissioning of bricks, including open file
>     descriptors and hard links.
>
>
> Last set of patches for this should be reviewed and accepted before we 
> make that claim :-) [ http://review.gluster.org/5891 ]
>
>     This in a way is a feature overlap with replace-brick's data
>     migration functionality. Replace-brick's data migration is
>     currently also used for planned decommissioning of a brick.
>
>     Reasons to remove replace-brick (or why remove-brick is better):
>
>     - There are two methods of moving data. It is confusing for the
>     users and hard for developers to maintain.
>
>     - If server being replaced is a member of a replica set, neither
>     remove-brick nor replace-brick data migration is necessary,
>     because self-healing itself will recreate the data (replace-brick
>     actually uses self-heal internally)
>
>     - In a non-replicated config if a server is getting replaced by a
>     new one, add-brick <new> + remove-brick <old> "start" achieves the
>     same goal as replace-brick <old> <new> "start".
>
>
> Should we phase out CLI of doing a 'remove-brick' without any option 
> too? because even if users do it by mistake, they would loose data. We 
> should enforce 'start' and then 'commit' usage of remove-brick. Also 
> if old method is required for anyone, they anyways have 'force' option.
>
>     - In a non-replicated config, <replace-brick> is NOT glitch free
>     (applications witness ENOTCONN if they are accessing data) whereas
>     add-brick <new> + remove-brick <old> is completely transparent.
>
>
> +10 (thats the number of bugs open on these things :-)
>
>     - Replace brick strictly requires a server with enough free space
>     to hold the data of the old brick, whereas remove-brick will
>     evenly spread out the data of the bring being removed amongst the
>     remaining servers.
>
>     - Replace-brick code is complex and messy (the real reason :p).
>
>
> Wanted to see this reason as 1st point, but its ok as long as we 
> mention about this. I too agree that its _hard_ to maintain that piece 
> of code.
>
>     - No clear reason why replace-brick's data migration is better in
>     any way to remove-brick's data migration.
>
>
> One reason I heard when I sent the mail on gluster-devel earlier 
> (http://lists.nongnu.org/archive/html/gluster-devel/2012-10/msg00050.html 
> ) was that the remove-brick way was bit slower than that of 
> replace-brick. Technical reason being remove-brick does DHT's readdir, 
> where as replace-brick does the brick level readdir.
>
>     I plan to send out patches to remove all traces of replace-brick
>     data migration code by 3.5 branch time.
>
> Thanks for the initiative, let me know if you need help.
>
>     NOTE that replace-brick command itself will still exist, and you
>     can replace on server with another in case a server dies. It is
>     only the data migration functionality being phased out.
>
>
> Yes, we need to be careful about this. We would need 'replace-brick' 
> to phase out a dead brick. The other day, there was some discussion on 
> have 'gluster peer replace <old-peer> <new-peer>' which would re-write 
> all the vol files properly. But thats mostly for 3.6 time frame IMO.
>
>     Please do ask any questions / raise concerns at this stage :)
>
>
> What is the window before you start sending out patches ?? I see 
> http://review.gluster.org/6010 which I guess is not totally complete 
> without phasing out pump xlator :-)
>
> I personally am all in for this change, as it helps me to finish few 
> more enhancements I am working on like 'discover()' changes etc...
>
> Regards,
> Amar
>
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://supercolony.gluster.org/mailman/listinfo/gluster-users

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20130927/1ab708d7/attachment.html>