Issues removing then adding a brick to a replica volume (Gluster 3.7.6)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Been running through my eternal testing regime ... and experimenting with removing/adding bricks - to me, a necessary part of volume maintenance for dealing with failed disks. The datastore is a VM host and all the following is done live. Sharding is active with a 512MB shard size.

So I started off with a replica 3 volume

// recreated from memory
Volume Name: datastore1
Type: Replicate
Volume ID: bf882533-f1a9-40bf-a13e-d26d934bfa8b
Status: Started
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: vnb.proxmox.softlog:/vmdata/datastore1
Brick2: vng.proxmox.softlog:/vmdata/datastore1
Brick3: vna.proxmox.softlog:/vmdata/datastore1


I remove a brick with:

gluster volume remove-brick datastore1 replica 2  vng.proxmox.softlog:/vmdata/datastore1 force

so we end up with:

Volume Name: datastore1
Type: Replicate
Volume ID: bf882533-f1a9-40bf-a13e-d26d934bfa8b
Status: Started
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: vna.proxmox.softlog:/vmdata/datastore1
Brick2: vnb.proxmox.softlog:/vmdata/datastore1


All well and good. No heal issues, VM's running ok.

Then I clean the brick off the vng host:

rm -rf /vmdata/datastore1


I then add the brick back with:

gluster volume add-brick datastore1 replica 3  vng.proxmox.softlog:/vmdata/datastore1

Volume Name: datastore1
Type: Replicate
Volume ID: bf882533-f1a9-40bf-a13e-d26d934bfa8b
Status: Started
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: vna.proxmox.softlog:/vmdata/datastore1
Brick2: vnb.proxmox.softlog:/vmdata/datastore1
Brick3: vng.proxmox.softlog:/vmdata/datastore1


This recreates the brick directory "datastore1". Unfortunately this is where things start to go wrong :( Heal info:

gluster volume heal datastore1 info
Brick vna.proxmox.softlog:/vmdata/datastore1
/.shard/d6aad699-d71d-4b35-b021-d35e5ff297c4.57
/.shard/d6aad699-d71d-4b35-b021-d35e5ff297c4.5
Number of entries: 2

Brick vnb.proxmox.softlog:/vmdata/datastore1
/.shard/d6aad699-d71d-4b35-b021-d35e5ff297c4.5
/.shard/d6aad699-d71d-4b35-b021-d35e5ff297c4.57
Number of entries: 2

Brick vng.proxmox.softlog:/vmdata/datastore1
/.shard/d6aad699-d71d-4b35-b021-d35e5ff297c4.1
/.shard/d6aad699-d71d-4b35-b021-d35e5ff297c4.6
/.shard/d6aad699-d71d-4b35-b021-d35e5ff297c4.15
/.shard/d6aad699-d71d-4b35-b021-d35e5ff297c4.18
/.shard/d6aad699-d71d-4b35-b021-d35e5ff297c4.5

Its my understanding that there shouldn't be any heal entries on vng as it that is where all the shards should be sent *to*

also running qemu-img check on the hosted VM images results in a I/O error. Eventually the VM's themselves crash - I suspect this is due to individual shards being unreadable.

Another odd behaviour I get is if I run a full heal on vnb I get the following error:

Launching heal operation to perform full self heal on volume datastore1 has been unsuccessful

However if I run it on VNA, it succeeds.


Lastly - if I remove the brick everythign returns to normal immediately. Heal Info shows no issues and qemu-img check returns no errors.




-- 
Lindsay Mathieson
_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users

[Index of Archives]     [Gluster Development]     [Linux Filesytems Development]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux