Re: How to replace a dead brick? (3.6.5)

sreejith kb <sree15081947@xxxxxxxxx> · Wed, 7 Oct 2015 16:58:55 +0530

Hi,
     While you removing a failed brick from four existing cluster volume try to provide the correct replica number  'n-1' while removing a brick from 'n' number of bricks from a gluster volume.

so here you are trying to remove one brick from a volume that contain 2 number of bricks in total, so do like this

gluster volume remove-brick datastore1 replica 1 vnb.proxmox.softlog:/glusterdata/datastore1c  force.

Follow the same strategy while adding a brick to an existing cluster volume. provide replica number as 'n+1'

and if you are using a cloned VM that already contains  gluster packages installed on it and have some gluster volume/peer/brick information on it, then reset those values( including extended attributes )  and then only add that new node/brick to your existing cluster.

and if you are replacing a failed node with a new one that having the same IP, then after probing the peer you have to set the volume attributes on it and restart the gluster-server service, then everything will be fine. If you have anymore doubt in that feel free to contact.

regards,
sreejith K B,
sree15081947@xxxxxxxxx
mob:09895315396

On 7 October 2015 at 12:36, Lindsay Mathieson <lindsay.mathieson@xxxxxxxxx> wrote:
First up - one of the things that concerns me re gluster is the incoherent state of documentation. The only docs linked on the main webpage are for 3.2 and there is almost nothing on how to handle failure modes such as dead disks/bricks etc, which is one of glusters primary functions.

My problem - I have a replica 2 volume, 2 nodes, 2 bricks (zfs datasets).

As a test, I destroyed one brick (zfs destroy the dataset).

Can't start the datastore1:

  volume start: datastore1: failed: Failed to find brick directory /glusterdata/datastore1 for volume datastore1. Reason : No such file or directory

A bit disturbing, I was hoping it would work off the remaining brick.

Can't replace the brick:

  gluster volume replace-brick datastore1 vnb.proxmox.softlog:/glusterdata/datastore1 vnb.proxmox.softlog:/glusterdata/datastore1-2 commit force

because the store is not running.

After a lot of googling I found list messages referencing the remove brick command:
gluster volume remove-brick datastore1 replica 2 vnb.proxmox.softlog:/glusterdata/datastore1c commit force

Fails with the unhelpful error:

wrong brick type: commit, use <HOSTNAME>:<export-dir-abs-path>
Usage: volume remove-brick <VOLNAME> [replica <COUNT>] <BRICK> ... <start|stop|status|commit|force>

In the end I destroyed and recreated the volume so I could resume testing, but I have no idea how I would handle a real failed brick in the future

-- 
Lindsay

_______________________________________________

Gluster-users mailing list

Gluster-users@xxxxxxxxxxx

http://www.gluster.org/mailman/listinfo/gluster-users

-- 
........................................................................................
Regards,
Sreejith k b
Mob: 09895315396

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users