File Corruption when adding bricks to live replica volumes

Lindsay Mathieson <lindsay.mathieson@xxxxxxxxx> · Tue, 19 Jan 2016 21:24:07 +1000



    gluster 3.7.6

    
    I seem to be able to reliably reproduce this. I have a replica 2
    volume with 1 test VM image. While the VM is  running with heavy
    disk read/writes  (disk benchmark) I add a 3rd brick for replica 3:

    
    gluster volume add-brick datastore1 replica 3 
      vng.proxmox.softlog:/vmdata/datastore1 

      
      I pretty much immediately get this:

      
    gluster volume heal datastore1 info

      Brick vna.proxmox.softlog:/vmdata/datastore1

      /.shard/d6aad699-d71d-4b35-b021-d35e5ff297c4.20

      /.shard/d6aad699-d71d-4b35-b021-d35e5ff297c4.22

      /.shard/d6aad699-d71d-4b35-b021-d35e5ff297c4.55 - Possibly
        undergoing heal

      
      /images/301/vm-301-disk-1.qcow2 - Possibly undergoing heal

      
      Number of entries: 4

      
      Brick vnb.proxmox.softlog:/vmdata/datastore1

      /images/301/vm-301-disk-1.qcow2 - Possibly undergoing heal

      
      /.shard/d6aad699-d71d-4b35-b021-d35e5ff297c4.55 - Possibly
        undergoing heal

      
      /.shard/d6aad699-d71d-4b35-b021-d35e5ff297c4.20

      /.shard/d6aad699-d71d-4b35-b021-d35e5ff297c4.22

      Number of entries: 4

      
      Brick vng.proxmox.softlog:/vmdata/datastore1

      /.shard/d6aad699-d71d-4b35-b021-d35e5ff297c4.16

      /.shard/d6aad699-d71d-4b35-b021-d35e5ff297c4.28

      /.shard/d6aad699-d71d-4b35-b021-d35e5ff297c4.1

      /.shard/d6aad699-d71d-4b35-b021-d35e5ff297c4.22

      /.shard/d6aad699-d71d-4b35-b021-d35e5ff297c4.77

      /.shard/d6aad699-d71d-4b35-b021-d35e5ff297c4.9

      /.shard/d6aad699-d71d-4b35-b021-d35e5ff297c4.5

      /.shard/d6aad699-d71d-4b35-b021-d35e5ff297c4.2

      /.shard/d6aad699-d71d-4b35-b021-d35e5ff297c4.26

      /.shard/d6aad699-d71d-4b35-b021-d35e5ff297c4.15

      /.shard/d6aad699-d71d-4b35-b021-d35e5ff297c4.13

      /.shard/d6aad699-d71d-4b35-b021-d35e5ff297c4.3

      /.shard/d6aad699-d71d-4b35-b021-d35e5ff297c4.18

      Number of entries: 13

    
    The brick on vng is the new empty brick, but it has 13 shards
    being healed back to vna & vnb. That can't be right and if I
    leave it the VM becomes hopelessly corrupted. Also there are 81
    shards in the files, they should all be queued for healing.

    
    Additionally I get read errors when I run a qemu-img check on the VM
    image. If I remove the vng brick the problems are resolved.

    
    If I do the same process while the VM is not running - i.e no files
    are being access, every proceeds as expect. All shard on vn &
    vnb are healed to vng,

    
    -- 
Lindsay Mathieson
  

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users