Re: [External] Re: Self Heal Confusion

Brett Holcomb <biholcomb@xxxxxxxxxx> · Tue, 1 Jan 2019 11:58:31 -0500



    Healing time set to 120 seconds for now.
    Just to make sure I understand I need to take the result of the
      gluster volume heal projects info and put it in a file. Then try
      and find each guid listed in that file in the .glusterfs directory
      for each brick listed in the output as having unhealed files and
      delete that file - if it exists.  If it doesn't exist don't worry
      about it. 

    
    So these bricks have unhealed entries listed

    
    /srv/gfs01/Projects/.glusterfs - 85 files

    
    /srv/gfs05/Projects/.glusterfs  - 58854 files

    
    /srv/gfs06/Projects/.glusterfs- 58854 files
    Script time!

    
    On 12/31/18 4:39 AM, Davide Obbi wrote:

    
      cluster.quorum-type auto

        cluster.quorum-count (null)

        cluster.server-quorum-type off

        cluster.server-quorum-ratio 0

        cluster.quorum-reads                    no

        
        Where exacty do I remove the gfid entries from - the .glusterfs
        

         directory? --> yes can't remember exactly where but try
          to do a find in the brick paths with the gfid  it should
          return something

        
        Where do I put the cluster.heal-timeout option - which file?
        --> gluster volume set volumename option value 

      
        On Mon, Dec 31, 2018 at 10:34 AM Brett Holcomb
          <biholcomb@xxxxxxxxxx> wrote:

        
        That
          is probably the case as a lot of files were deleted some time
          ago.

          
          I'm on version 5.2 but was on 3.12 until about a week ago.

          
          Here is the quorum info.  I'm running a distributed replicated
          volumes 

          in 2 x 3 = 6

          
          cluster.quorum-type auto

          cluster.quorum-count (null)

          cluster.server-quorum-type off

          cluster.server-quorum-ratio 0

          cluster.quorum-reads                    no

          
          Where exacty do I remove the gfid entries from - the
          .glusterfs 

          directory?  Do I just delete all the directories can files
          under this 

          directory?

          
          Where do I put the cluster.heal-timeout option - which file?

          
          I think you've hit on the cause of the issue.  Thinking back
          we've had 

          some extended power outages and due to a misconfiguration in
          the swap 

          file device name a couple of the nodes did not come up and I
          didn't 

          catch it for a while so maybe the deletes occured then.

          
          Thank you.

          
          On 12/31/18 2:58 AM, Davide Obbi wrote:

          > if the long GFID does not correspond to any file it could
          mean the 

          > file has been deleted by the client mounting the volume.
          I think this 

          > is caused when the delete was issued and the number of
          active bricks 

          > were not reaching quorum majority or a second brick was
          taken down 

          > while another was down or did not finish the selfheal,
          the latter more 

          > likely.

          > It would be interesting to see:

          > - what version of glusterfs you running, it happened to
          me with 3.12

          > - volume quorum rules: "gluster volume get vol all | grep
          quorum"

          >

          > To clean it up if i remember correctly it should be
          possible to delete 

          > the gfid entries from the brick mounts on the glusterfs
          server nodes 

          > reporting the files to heal.

          >

          > As a side note you might want to consider changing the
          selfheal 

          > timeout to more agressive schedule in
          cluster.heal-timeout option

          _______________________________________________

          Gluster-users mailing list

          Gluster-users@xxxxxxxxxxx

          https://lists.gluster.org/mailman/listinfo/gluster-users
      
      
      -- 

      
            Davide Obbi
            System
              Administrator

              
            Booking.com
              B.V.

              Vijzelstraat 66-80 Amsterdam 1017HL Netherlands
            Direct +31207031558

            
              Empowering people to
                  experience the world since 1996
              43 languages, 214+
                  offices worldwide, 141,000+ global destinations, 29
                  million reported listings 

                Subsidiary of
                  Booking Holdings Inc. (NASDAQ: BKNG)
            
          
_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
https://lists.gluster.org/mailman/listinfo/gluster-users