Re: Very poor heal behaviour in 3.7.9

Lindsay Mathieson <lindsay.mathieson@xxxxxxxxx> · Mon, 28 Mar 2016 11:08:12 +1000



    On 27/03/2016 12:33 AM, Lindsay
      Mathieson wrote:

    
    On
      26/03/2016 11:58 PM, Pranith Kumar Karampuri wrote:
      

        Is that the same
          issue I posted earlier re "gluster volume heal info" appearing
          to block I/O?
          

        I don't think it is heal info that is blocking I/O. I think it
        is client triggering heal and block the fop until heal completes
        that results in this pattern. This data-heal disabling should
        get you out of this problem. 
      

      I tried it earlier and it didn't seem to help.
      

      Does anything need to be restarted after cluster.data-self-heal is
      set off?
      

    Tried again this morning. 100% replicate the behaviour I noted in

    
    After testing the heal process by killing
      glusterfsd on a node I noticed the following.
      

      - I/O continued at normal speed while glusterfsd was down.
      

      - After restarting glusterfsd, I/O still continued as normal
      

      - performing a "gluster volume heal datastore2 info" whould show
      some info then hang.
      

      - I/O on the cluster would cease. e.g in a VM where I was running
      a command line build of a large project, the build just stopped.
      The VM itself was mostly responsive but anything that involved
      accessing the disk hung.
      

      - if I killed the "gluster volume heal datastore2 info" command
      then I/O in the VM's resumed at a normal pace.
      

      - if I then reissued the "gluster volume heal datastore2 info"
      command I/O would continue for a short while (seconds - minutes)
      before hanging again.
      

      - killing the heal info command would resume I/O again.
      

    iowait and cpu are under 4% on all three nodes.

    
    Even after I shutdown all vm's on datastore2 "gluster volume heal
    datastore2 info" hung indefinitely with no output. 

    
    I had to stop/start the datastore2 before the info would work, it
    rteurned very quickly with:

    
    Brick vnb.proxmox.softlog:/tank/vmdata/datastore2

      Number of entries: 0

      
      Brick vng.proxmox.softlog:/tank/vmdata/datastore2

      /.shard - Possibly undergoing heal

      
      Number of entries: 1

      
      Brick vna.proxmox.softlog:/tank/vmdata/datastore2

      /.shard - Possibly undergoing heal

      
      Number of entries: 1
    

    Unfortunately its stayed that way for 10 minutes now.

    
    I'd like to recheck this behaviour under 3.7.7 - can I just revert
    to that (debian packages) without recreating the datastore?

    
    thanks,

    
    -- 
Lindsay Mathieson
  

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users