Re: Issues removing then adding a brick to a replica volume (Gluster 3.7.6)

Krutika Dhananjay <kdhananj@xxxxxxxxxx> · Mon, 18 Jan 2016 07:24:47 -0500 (EST)

From: "Lindsay Mathieson" <lindsay.mathieson@xxxxxxxxx>
To: "gluster-users" <gluster-users@xxxxxxxxxxx>
Sent: Monday, January 18, 2016 11:19:22 AM
Subject:  Issues removing then adding a brick to a replica volume (Gluster 3.7.6)

    Been running through my eternal testing regime ... and experimenting
    with removing/adding bricks - to me, a necessary part of volume
    maintenance for dealing with failed disks. The datastore is a VM
    host and all the following is done live. Sharding is active with a
    512MB shard size.

    So I started off with a replica 3 volume

// recreated from memory
Volume Name: datastore1
Type: Replicate
Volume ID: bf882533-f1a9-40bf-a13e-d26d934bfa8b
Status: Started
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: vnb.proxmox.softlog:/vmdata/datastore1
Brick2: vng.proxmox.softlog:/vmdata/datastore1
Brick3: vna.proxmox.softlog:/vmdata/datastore1

    I remove a brick with:

gluster volume remove-brick datastore1 replica 2 
      vng.proxmox.softlog:/vmdata/datastore1 force

    so we end up with:

Volume Name: datastore1
Type: Replicate
Volume ID: bf882533-f1a9-40bf-a13e-d26d934bfa8b
Status: Started
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: vna.proxmox.softlog:/vmdata/datastore1
Brick2: vnb.proxmox.softlog:/vmdata/datastore1

    All well and good. No heal issues, VM's running ok.

    Then I clean the brick off the vng host:

rm -rf /vmdata/datastore1

    I then add the brick back with:

gluster volume add-brick datastore1 replica 3 
        vng.proxmox.softlog:/vmdata/datastore1 

        Volume Name: datastore1

        Type: Replicate

        Volume ID: bf882533-f1a9-40bf-a13e-d26d934bfa8b

        Status: Started

        Number of Bricks: 1 x 3 = 3

        Transport-type: tcp

        Bricks:

        Brick1: vna.proxmox.softlog:/vmdata/datastore1

        Brick2: vnb.proxmox.softlog:/vmdata/datastore1

        Brick3: vng.proxmox.softlog:/vmdata/datastore1

    This recreates the brick directory "datastore1". Unfortunately
    this is where things start to go wrong :( Heal info:

gluster volume heal datastore1 info
Brick vna.proxmox.softlog:/vmdata/datastore1
/.shard/d6aad699-d71d-4b35-b021-d35e5ff297c4.57 
/.shard/d6aad699-d71d-4b35-b021-d35e5ff297c4.5 
Number of entries: 2

Brick vnb.proxmox.softlog:/vmdata/datastore1
/.shard/d6aad699-d71d-4b35-b021-d35e5ff297c4.5 
/.shard/d6aad699-d71d-4b35-b021-d35e5ff297c4.57 
Number of entries: 2

Brick vng.proxmox.softlog:/vmdata/datastore1
/.shard/d6aad699-d71d-4b35-b021-d35e5ff297c4.1 
/.shard/d6aad699-d71d-4b35-b021-d35e5ff297c4.6 
/.shard/d6aad699-d71d-4b35-b021-d35e5ff297c4.15 
/.shard/d6aad699-d71d-4b35-b021-d35e5ff297c4.18 
/.shard/d6aad699-d71d-4b35-b021-d35e5ff297c4.5 

    Its my understanding that there shouldn't be any heal entries on vng
    as it that is where all the shards should be sent *to*

Lindsay,

Heal _is_ necessary when you add a brick that changes the replica count from n to (n+1). Now the new brick that is also part of the existing replica set is lagging with respect to the existing bricks
and needs to be brought in sync with these. All files and directories in vna and/or vnb will be healed to vng in your case.

    also running qemu-img check on the hosted VM images results in a I/O
    error. Eventually the VM's themselves crash - I suspect this is due
    to individual shards being unreadable.

    Another odd behaviour I get is if I run a full heal on vnb I get the
    following error:

Launching heal operation to perform full self heal
        on volume datastore1 has been unsuccessful

    However if I run it on VNA, it succeeds.

Yes, there is a bug report for this @ https://bugzilla.redhat.com/show_bug.cgi?id=1112158.
The workaround, like you yourself figured, is to run the command on the node with the highest uuid.
Steps:
1) Collect output of `cat /var/lib/glusterd/glusterd.info | grep UUID` from each of the nodes, perhaps into a file named 'uuid.txt'.
2) cat uuid.txt | sort
3) Pick the last gfid.
4) find out which of the glusterd.info files has the same uuid as this selected uuid.
5) Run 'heal info full' on that same node.

Let me know if this works for you.

-Krutika

    Lastly - if I remove the brick everythign returns to normal
    immediately. Heal Info shows no issues and qemu-img check returns no
    errors.

    -- 
Lindsay Mathieson

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users