"Incorrect brick" errors

toby.corkindale at strategicdata.com.au (Toby Corkindale) · Thu, 08 Aug 2013 14:49:45 +1000

On 08/08/13 13:09, Krishnan Parthasarathi wrote:
> Hi Toby,
>
> ----- Original Message -----
>> Hi,
>> I'm getting some confusing "Incorrect brick" errors when attempting to
>> remove OR replace a brick.
>>
>> gluster> volume info condor
>>
>> Volume Name: condor
>> Type: Replicate
>> Volume ID: 9fef3f76-525f-4bfe-9755-151e0d8279fd
>> Status: Started
>> Number of Bricks: 1 x 2 = 2
>> Transport-type: tcp
>> Bricks:
>> Brick1: mel-storage01:/srv/brick/condor
>> Brick2: mel-storage02:/srv/brick/condor
>>
>> gluster> volume remove-brick condor replica 1
>> mel-storage02:/srv/brick/condor start
>> Incorrect brick mel-storage02:/srv/brick/condor for volume condor
>>
>>
>> If that is the incorrect brick, then what have I done wrong?
>
> I agree that the error message displayed is far from helpful. The reason your
> attempt to remove a brick from 1X2 replicate volume failed is because
> it is not a 'legal' operation.
>
> Here are some rules and background, that are implicit, about how to determine if a
> remove-brick operation is allowed. Some may seem debatable, but
> that is how things are today. We could refine them and arrive evolve
> better set of rules via discussions on the mailing lists.
>
> 1) remove-brick start variant is applicable *only* when you have the dht (or distribute)
> type volume. In 3.3, you could identify that by observing the output of "gluster volume info <VOLNAME>".
> The "Type" field would display "Distribute-<something>". Additionally, even in a
> Distribute type volume, which includes Distribute-Replicate Distribute-Stripe and other combinations,
> all the bricks belonging to the subvolume would need to be removed in one go.
> For eg,
> Lets assume a 2X2 volume V1, with bricks b1, b2, b3, b4, such that b1,b2 form a pair; b3,b4 form the other pair.
> If you wanted to use the remove-brick start variant, say for scaling down the volume, you should do the following,
>
> #gluster volume remove-brick V1 b3 b4 start
> #gluster volume remove-brick V1 b3 b4 status
>
> Once the remove-brick operation is completed,
> #gluster volume remove-brick V1 b3 b4 commit
>
> This would leave volume V1 with bricks b1,b2.
>
> In the above workflow, the data residing in b3,b4 is migrated to
> b1,b2.
>
> 2) remove-brick (without the 'start' subcommand) can be used to reduce the replica count till 2,
> in a Distribute-Replicate type volume. As of today, remove-brick doesn't permit reducing of
> replica count in a pure replicate volume. ie. 1XN, where N >= 2.
> Note: There is some activity around evolving the 'right' rule. See http://review.gluster.com/#/c/5364/
>
> The above rules have been evolved with the thought that, no legal command must allow the
> user to shoot her foot, without a 'repair' path. Put differently, we disallow commands
> that might lead to data loss, without the user being fully aware of it.
>
> Hope that helps,
> krish

Well, it's a bit of a moot point now, since we had to rebuild the 
cluster anyway.

Note that we attempted to raise the replica level to 3 and THEN remove 
the old brick, and that failed to work. We also tried using 
replace-brick to switch the old one out for the new one. That also 
failed with Incorrect Brick. (the replace-brick method was actually the 
first way we tried)

As such -- it seems there is no way to replace a failed server with a 
new one if you're using the Replicated setup?

Toby