Replacing a failed brick

david.c.gibbons at gmail.com (David Gibbons) · Fri, 16 Aug 2013 11:03:15 -0400

Ravi,

Thanks for the tips. When I run a volume status:
gluster> volume status test-a
Status of volume: test-a
Gluster process                                         Port    Online  Pid
------------------------------------------------------------------------------
Brick 10.250.4.63:/localmnt/g1lv2                       49152   Y       8072
Brick 10.250.4.65:/localmnt/g2lv2                       49152   Y       3403
Brick 10.250.4.63:/localmnt/g1lv3                       49153   Y       8081
Brick 10.250.4.65:/localmnt/g2lv3                       49153   Y       3410
Brick 10.250.4.63:/localmnt/g1lv4                       49154   Y       8090
Brick 10.250.4.65:/localmnt/g2lv4                       49154   Y       3417
Brick 10.250.4.63:/localmnt/g1lv5                       49155   Y       8099
Brick 10.250.4.65:/localmnt/g2lv5                       N/A     N       N/A
Brick 10.250.4.63:/localmnt/g1lv1                       49156   Y       8576
Brick 10.250.4.65:/localmnt/g2lv1                       49156   Y       3431
NFS Server on localhost                                 2049    Y       3440
Self-heal Daemon on localhost                           N/A     Y       3445
NFS Server on 10.250.4.63                               2049    Y       8586
Self-heal Daemon on 10.250.4.63                         N/A     Y       8593

There are no active volume tasks
--

Attempting to start the volume results in:
gluster> volume start test-a force
volume start: test-a: failed: Failed to get extended attribute
trusted.glusterfs.volume-id for brick dir /localmnt/g2lv5. Reason : No data
available
--

It doesn't like when I try to fire off a heal either:
gluster> volume heal test-a
Launching Heal operation on volume test-a has been unsuccessful
--

Although that did lead me to this:
gluster> volume heal test-a info
Gathering Heal info on volume test-a has been successful

Brick 10.250.4.63:/localmnt/g1lv2
Number of entries: 0

Brick 10.250.4.65:/localmnt/g2lv2
Number of entries: 0

Brick 10.250.4.63:/localmnt/g1lv3
Number of entries: 0

Brick 10.250.4.65:/localmnt/g2lv3
Number of entries: 0

Brick 10.250.4.63:/localmnt/g1lv4
Number of entries: 0

Brick 10.250.4.65:/localmnt/g2lv4
Number of entries: 0

Brick 10.250.4.63:/localmnt/g1lv5
Number of entries: 0

Brick 10.250.4.65:/localmnt/g2lv5
Status: Brick is Not connected
Number of entries: 0

Brick 10.250.4.63:/localmnt/g1lv1
Number of entries: 0

Brick 10.250.4.65:/localmnt/g2lv1
Number of entries: 0
--

So perhaps I need to re-connect the brick?

Cheers,
Dave

On Fri, Aug 16, 2013 at 12:43 AM, Ravishankar N <ravishankar at redhat.com>wrote:

>  On 08/15/2013 10:05 PM, David Gibbons wrote:
>
> Hi There,
>
>  I'm currently testing Gluster for possible production use. I haven't
> been able to find the answer to this question in the forum arch or in the
> public docs. It's possible that I don't know which keywords to search for.
>
>  Here's the question (more details below): let's say that one of my
> bricks "fails" -- *not* a whole node failure but a single brick failure
> within the node. How do I replace a single brick on a node and force a sync
> from one of the replicas?
>
>  I have two nodes with 5 bricks each:
>  gluster> volume info test-a
>
>  Volume Name: test-a
> Type: Distributed-Replicate
> Volume ID: e8957773-dd36-44ae-b80a-01e22c78a8b4
> Status: Started
> Number of Bricks: 5 x 2 = 10
> Transport-type: tcp
> Bricks:
> Brick1: 10.250.4.63:/localmnt/g1lv2
> Brick2: 10.250.4.65:/localmnt/g2lv2
> Brick3: 10.250.4.63:/localmnt/g1lv3
> Brick4: 10.250.4.65:/localmnt/g2lv3
> Brick5: 10.250.4.63:/localmnt/g1lv4
> Brick6: 10.250.4.65:/localmnt/g2lv4
> Brick7: 10.250.4.63:/localmnt/g1lv5
> Brick8: 10.250.4.65:/localmnt/g2lv5
> Brick9: 10.250.4.63:/localmnt/g1lv1
> Brick10: 10.250.4.65:/localmnt/g2lv1
>
>  I formatted 10.250.4.65:/localmnt/g2lv5 (to simulate a "failure"). What
> is the next step? I have tried various combinations of removing and
> re-adding the brick, replacing the brick, etc. I read in a previous message
> to this list that replace-brick was for planned changes which makes sense,
> so that's probably not my next step.
>
> You must first check if the 'formatted' brick 10.250.4.65:/localmnt/g2lv5
> is online using the `gluster volume status` command. If not start the
> volume using `gluster volume start <VOLNAME>force`. You can then use the
> gluster volume heal command which would copy the data from the other
> replica brick into your formatted brick.
> Hope this helps.
> -Ravi
>
>
>  Cheers,
> Dave
>
>
> _______________________________________________
> Gluster-users mailing listGluster-users at gluster.orghttp://supercolony.gluster.org/mailman/listinfo/gluster-users
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20130816/00c31469/attachment.html>