Re: Ensure volume is healed before node reboot

Pranith Kumar Karampuri <pkarampu@xxxxxxxxxx> · Thu, 22 Jan 2015 00:43:29 +0530



    On 01/08/2015 05:31 PM, Andreas Mather
      wrote:

    
      Hello!

        
        I'm setting up a qemu/KVM cluster with glusterfs as shared
        storage. While testing the cluster, I constantly hit a
        split-brain situation on VM image files which I cannot explain.
        

        Setup:
        2 bare metal servers running glusterfs (replica 2), having
          the volume mounted and one virtual machine which is located on
          the volume.
        

        Steps:
        1.) host1 runs VM, host2 is idle (but fully connected, i.e.
          replicating)
        2.) issue writes in VM  (about 10 MB, so nothing big)
        3.) live migrate VM from host1 to host2
        4.) issuing writes in VM 
        5.) sleep 120
        6.) umount volume, shut down glusterfs reboot host1
        7.) start glusterfs, wait 30 sec, mount volume on host1
        8.) sleep 120
        9.) migrate VM from host2 to host1
        

        Sometimes this works, but usually after I redo the whole
          operation a second time or with changed roles (i.e. reboot
          host2 after the VM was migrated away) I end up with a
          split-brained image file. 
        

        According to:
        gluster volume heal vol1 statistics

        
        split-brain is there after step 6. 
        

        Now, I think waiting for replication isn't enough, i.e.
          when I reboot one node, even though there weren't many writes,
          these writes weren't fully replicated yet. At least that's the
          simplest explanation to me.
        

        So what I want to ensure is that, after I migrated a VM
          from host1 to host2, all previous writes from host1 are fully
          replicated to host2 (which I would take as an indicator that
          it is safe to reboot host1). How can I accomplish this?
        

        My first guess was gluster volume heal vol1 info, but I'm
          not sure if I understand the output correctly (e.g. after
          resolving the split-brain by removing the image from one brick
          and seeing it replicated over the network, heal info still
          reports both nodes which makes no sense to me, since writes
          occur only from one node...)
      
    
    Andreas,

             I am sorry I took long to respond to your query. I do not
    see why a split-brain will happen after step-6. Most probably you
    may not have the logs because I am responding so late, but if you
    do, could you give me the logs of the setup?

    
    Pranith

    
        Thanks,
        

        Andreas
        

      _______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users
    
    
_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users