Nice, thanks for the clarification. -C.B. On 9/30/2013 2:46 AM, Amar Tumballi wrote: > On 09/28/2013 12:03 AM, Cool wrote: >> How does the new command set achieve this? >> >> old layout (2x2): >> rep=2: h1:/b1 h2:/b1 h1:/b2 h2:/b2 >> >> new layout (3x2): >> rep=2: h1:/b1 h2:/b1 h1:/b2 h3:/b1 h2:/b2 h3:/b2 >> >> purpose for the new layout is to make sure there is no SOF, as I >> cannot simple add h3:/b1 and h3:/b2 as a pair. >> >> With replace-brick it pretty straightforward, but without that ... >> should I remove-brick h2:/b2 then add-brick h3:/b1? this means I'm >> going to have only one copy for some data for a certain period of >> time, which makes me feel nervous. Or, should I add-brick h3:/b1 >> first? That doesn't seems to be reasonable either. >> >> Or am I the only one hitting this kind of upgrade? >> > No, you are not only one. This is the exact reason, we recommend > adding nodes in multiple of 2s. > > Also, another recommendation is having directories exported and not > the mountpoint itself for bricks. > > In your case, it would be (by following above best practice) > > # gluster volume info test-vol: > rep=2: h1:/b1/d1 h2:/b1/d1 h1:/b2/d1 h2:/b2/d1 > > # gluster volume add-brick test-vol h1:/b2/d2 h3:/b1/d1 h2:/b2/d2 > h3:/b2/d1 > # gluster volume remove-brick test-vol h1:/b2/d1 h2:/b2/d1 start > > # gluster volume remove-brick test-vol h1:/b2/d1 h2:/b2/d1 commit > > # gluster volume info test-vol: > rep=2: h1:/b1/d1 h2:/b1/d1 h1:/b2/d2 h3:/b1/d1 h2:/b2/d2 h3:/b2/d1 > > Hope this works. > > Regards, > Amar >> -C.B. >> >> On 9/27/2013 10:15 AM, Amar Tumballi wrote: >>> >>> Hello all, >>> DHT's remove-brick + rebalance has been enhanced in the last >>> couple of releases to be quite sophisticated. It can handle >>> graceful decommissioning of bricks, including open file >>> descriptors and hard links. >>> >>> >>> Last set of patches for this should be reviewed and accepted before >>> we make that claim :-) [ http://review.gluster.org/5891 ] >>> >>> This in a way is a feature overlap with replace-brick's data >>> migration functionality. Replace-brick's data migration is >>> currently also used for planned decommissioning of a brick. >>> >>> Reasons to remove replace-brick (or why remove-brick is better): >>> >>> - There are two methods of moving data. It is confusing for the >>> users and hard for developers to maintain. >>> >>> - If server being replaced is a member of a replica set, neither >>> remove-brick nor replace-brick data migration is necessary, >>> because self-healing itself will recreate the data (replace-brick >>> actually uses self-heal internally) >>> >>> - In a non-replicated config if a server is getting replaced by a >>> new one, add-brick <new> + remove-brick <old> "start" achieves >>> the same goal as replace-brick <old> <new> "start". >>> >>> >>> Should we phase out CLI of doing a 'remove-brick' without any option >>> too? because even if users do it by mistake, they would loose data. >>> We should enforce 'start' and then 'commit' usage of remove-brick. >>> Also if old method is required for anyone, they anyways have 'force' >>> option. >>> >>> - In a non-replicated config, <replace-brick> is NOT glitch free >>> (applications witness ENOTCONN if they are accessing data) >>> whereas add-brick <new> + remove-brick <old> is completely >>> transparent. >>> >>> >>> +10 (thats the number of bugs open on these things :-) >>> >>> - Replace brick strictly requires a server with enough free space >>> to hold the data of the old brick, whereas remove-brick will >>> evenly spread out the data of the bring being removed amongst the >>> remaining servers. >>> >>> - Replace-brick code is complex and messy (the real reason :p). >>> >>> >>> Wanted to see this reason as 1st point, but its ok as long as we >>> mention about this. I too agree that its _hard_ to maintain that >>> piece of code. >>> >>> - No clear reason why replace-brick's data migration is better in >>> any way to remove-brick's data migration. >>> >>> >>> One reason I heard when I sent the mail on gluster-devel earlier >>> (http://lists.nongnu.org/archive/html/gluster-devel/2012-10/msg00050.html >>> ) was that the remove-brick way was bit slower than that of >>> replace-brick. Technical reason being remove-brick does DHT's >>> readdir, where as replace-brick does the brick level readdir. >>> >>> I plan to send out patches to remove all traces of replace-brick >>> data migration code by 3.5 branch time. >>> >>> Thanks for the initiative, let me know if you need help. >>> >>> NOTE that replace-brick command itself will still exist, and you >>> can replace on server with another in case a server dies. It is >>> only the data migration functionality being phased out. >>> >>> >>> Yes, we need to be careful about this. We would need 'replace-brick' >>> to phase out a dead brick. The other day, there was some discussion >>> on have 'gluster peer replace <old-peer> <new-peer>' which would >>> re-write all the vol files properly. But thats mostly for 3.6 time >>> frame IMO. >>> >>> Please do ask any questions / raise concerns at this stage :) >>> >>> >>> What is the window before you start sending out patches ?? I see >>> http://review.gluster.org/6010 which I guess is not totally complete >>> without phasing out pump xlator :-) >>> >>> I personally am all in for this change, as it helps me to finish few >>> more enhancements I am working on like 'discover()' changes etc... >>> >>> Regards, >>> Amar >>> >>> >>> _______________________________________________ >>> Gluster-users mailing list >>> Gluster-users at gluster.org >>> http://supercolony.gluster.org/mailman/listinfo/gluster-users >> >> >> >> _______________________________________________ >> Gluster-devel mailing list >> Gluster-devel at nongnu.org >> https://lists.nongnu.org/mailman/listinfo/gluster-devel > > >