Inline response. On 09/27/2013 02:26 PM, James wrote: > On Fri, 2013-09-27 at 00:35 -0700, Anand Avati wrote: >> Hello all, > Hey, > > Interesting timing for this post... > I've actually started working on automatic brick addition/removal. (I'm > planning to add this to puppet-gluster of course.) I was hoping you > could help out with the algorithm. I think it's a bit different if > there's no replace-brick command as you are proposing. > > Here's the problem: > Given a logically optimal initial volume: > > volA: rep=2; h1:/b1 h2:/b1 h3:/b1 h4:/b1 h1:/b2 h2:/b2 h3:/b2 h4:/b2 > > suppose I know that I want to add/remove bricks such that my new volume > (if I had created it new) looks like: > > volB: rep=2; h1:/b1 h3:/b1 h4:/b1 h5:/b1 h6:/b1 h1:/b2 h3:/b2 h4:/b2 > h5:/b2 h6:/b2 > > What is the optimal algorithm for determining the correct sequence of > transforms that are needed to accomplish this task. Obviously there are > some simpler corner cases, but I'd like to solve the general case. > > The transforms are obviously things like running the add-brick {...} and > remove-brick {...} commands. This is the exact reason why we recommend in our best practice to have a directory inside a mountpoint exported as a brick, in this case, h1:/b1/d1 (where d1 is a directory inside mountpoint /b1). This helps in having a brick h1:/b1/d2 which is technically the same thing you would like to have in VolB. Also, it is never good to swap/change/move replica pairs to different sets... would lead into many issues, like duplicate files, etc etc.. >> >> >> - Replace brick strictly requires a server with enough free space to hold >> the data of the old brick, whereas remove-brick will evenly spread out the >> data of the bring being removed amongst the remaining servers. > Can you talk more about the replica = N case (where N is 2 or 3?) > With remove brick, add brick you will need add/remove N (replica count) > bricks at a time, right? With replace brick, you could just swap out > one, right? Isn't that a missing feature if you remove replace brick? For that particular swapping without data migration, you will still have 'replace-brick' existing. What it does is replace an existing brick of a replica pair with an empty brick, so replicate's self-heal daemon populates the data in it. >> Please do ask any questions / raise concerns at this stage :) > I heard with 3.4 you can somehow change the replica count when adding > new bricks... What's the full story here please? > > Yes, support in CLI for this existed with glusterfs-3.3.x (http://review.gluster.com/158) itself, just that there are few bugs. syntax of add-brick: gluster volume add-brick <VOLNAME> [<stripe|replica> <COUNT>] <NEW-BRICK> ... [force] - add brick to volume <VOLNAME> If you give 'replica N' where N is already existing replica count -1/+1. Regards, Amar