On 08/08/13 13:09, Krishnan Parthasarathi wrote: > Hi Toby, > > ----- Original Message ----- >> Hi, >> I'm getting some confusing "Incorrect brick" errors when attempting to >> remove OR replace a brick. >> >> gluster> volume info condor >> >> Volume Name: condor >> Type: Replicate >> Volume ID: 9fef3f76-525f-4bfe-9755-151e0d8279fd >> Status: Started >> Number of Bricks: 1 x 2 = 2 >> Transport-type: tcp >> Bricks: >> Brick1: mel-storage01:/srv/brick/condor >> Brick2: mel-storage02:/srv/brick/condor >> >> gluster> volume remove-brick condor replica 1 >> mel-storage02:/srv/brick/condor start >> Incorrect brick mel-storage02:/srv/brick/condor for volume condor >> >> >> If that is the incorrect brick, then what have I done wrong? > > I agree that the error message displayed is far from helpful. The reason your > attempt to remove a brick from 1X2 replicate volume failed is because > it is not a 'legal' operation. > > Here are some rules and background, that are implicit, about how to determine if a > remove-brick operation is allowed. Some may seem debatable, but > that is how things are today. We could refine them and arrive evolve > better set of rules via discussions on the mailing lists. > > 1) remove-brick start variant is applicable *only* when you have the dht (or distribute) > type volume. In 3.3, you could identify that by observing the output of "gluster volume info <VOLNAME>". > The "Type" field would display "Distribute-<something>". Additionally, even in a > Distribute type volume, which includes Distribute-Replicate Distribute-Stripe and other combinations, > all the bricks belonging to the subvolume would need to be removed in one go. > For eg, > Lets assume a 2X2 volume V1, with bricks b1, b2, b3, b4, such that b1,b2 form a pair; b3,b4 form the other pair. > If you wanted to use the remove-brick start variant, say for scaling down the volume, you should do the following, > > #gluster volume remove-brick V1 b3 b4 start > #gluster volume remove-brick V1 b3 b4 status > > Once the remove-brick operation is completed, > #gluster volume remove-brick V1 b3 b4 commit > > This would leave volume V1 with bricks b1,b2. > > In the above workflow, the data residing in b3,b4 is migrated to > b1,b2. > > 2) remove-brick (without the 'start' subcommand) can be used to reduce the replica count till 2, > in a Distribute-Replicate type volume. As of today, remove-brick doesn't permit reducing of > replica count in a pure replicate volume. ie. 1XN, where N >= 2. > Note: There is some activity around evolving the 'right' rule. See http://review.gluster.com/#/c/5364/ > > The above rules have been evolved with the thought that, no legal command must allow the > user to shoot her foot, without a 'repair' path. Put differently, we disallow commands > that might lead to data loss, without the user being fully aware of it. > > Hope that helps, > krish Well, it's a bit of a moot point now, since we had to rebuild the cluster anyway. Note that we attempted to raise the replica level to 3 and THEN remove the old brick, and that failed to work. We also tried using replace-brick to switch the old one out for the new one. That also failed with Incorrect Brick. (the replace-brick method was actually the first way we tried) As such -- it seems there is no way to replace a failed server with a new one if you're using the Replicated setup? Toby