Re: Self Heal Confusion

Brett Holcomb <biholcomb@xxxxxxxxxx> · Thu, 27 Dec 2018 17:19:50 -0500



    Resend as I did not reply to the list earlier.  TBird responded
      to the poster and not the list.

    
    On 12/27/18 11:46 AM, Brett Holcomb
      wrote:

    
      Thank you. I appreciate the help  Here is the information.  Let
        me know if you need anything else.  I'm fairly new to gluster.

      
      Gluster version is 5.2
      1. gluster v info

      
      Volume Name: projects

        Type: Distributed-Replicate

        Volume ID: 5aac71aa-feaa-44e9-a4f9-cb4dd6e0fdc3

        Status: Started

        Snapshot Count: 0

        Number of Bricks: 2 x 3 = 6

        Transport-type: tcp

        Bricks:

        Brick1: gfssrv1:/srv/gfs01/Projects

        Brick2: gfssrv2:/srv/gfs01/Projects

        Brick3: gfssrv3:/srv/gfs01/Projects

        Brick4: gfssrv4:/srv/gfs01/Projects

        Brick5: gfssrv5:/srv/gfs01/Projects

        Brick6: gfssrv6:/srv/gfs01/Projects

        Options Reconfigured:

        cluster.self-heal-daemon: enable

        performance.quick-read: off

        performance.parallel-readdir: off

        performance.readdir-ahead: off

        performance.write-behind: off

        performance.read-ahead: off

        performance.client-io-threads: off

        nfs.disable: on

        transport.address-family: inet

        server.allow-insecure: on

        storage.build-pgfid: on

        changelog.changelog: on

        changelog.capture-del-path: on

        
      2.  gluster v status

      
      Status of volume: projects

        Gluster process                             TCP Port  RDMA Port 
        Online  Pid

------------------------------------------------------------------------------

        Brick gfssrv1:/srv/gfs01/Projects           49154     0         
        Y       7213 

        Brick gfssrv2:/srv/gfs01/Projects           49154     0         
        Y       6932 

        Brick gfssrv3:/srv/gfs01/Projects           49154     0         
        Y       6920 

        Brick gfssrv4:/srv/gfs01/Projects           49154     0         
        Y       6732 

        Brick gfssrv5:/srv/gfs01/Projects           49154     0         
        Y       6950 

        Brick gfssrv6:/srv/gfs01/Projects           49154     0         
        Y       6879 

        Self-heal Daemon on localhost               N/A       N/A       
        Y       11484

        Self-heal Daemon on gfssrv2                 N/A       N/A       
        Y       10366

        Self-heal Daemon on gfssrv4                 N/A       N/A       
        Y       9872 

        Self-heal Daemon on srv-1-gfs3.corp.l1049h.

        net                                         N/A       N/A       
        Y       9892 

        Self-heal Daemon on gfssrv6                 N/A       N/A       
        Y       10372

        Self-heal Daemon on gfssrv5                 N/A       N/A       
        Y       10761

         
        Task Status of Volume projects

------------------------------------------------------------------------------

        There are no active volume tasks
      3. I've given the summary since the actual list for two volumes
        is around 5900 entries.
      Brick gfssrv1:/srv/gfs01/Projects

        Status: Connected

        Total Number of entries: 85

        Number of entries in heal pending: 85

        Number of entries in split-brain: 0

        Number of entries possibly healing: 0

        
        Brick gfssrv2:/srv/gfs01/Projects

        Status: Connected

        Total Number of entries: 0

        Number of entries in heal pending: 0

        Number of entries in split-brain: 0

        Number of entries possibly healing: 0

        
        Brick gfssrv3:/srv/gfs01/Projects

        Status: Connected

        Total Number of entries: 0

        Number of entries in heal pending: 0

        Number of entries in split-brain: 0

        Number of entries possibly healing: 0

        
        Brick gfssrv4:/srv/gfs01/Projects

        Status: Connected

        Total Number of entries: 0

        Number of entries in heal pending: 0

        Number of entries in split-brain: 0

        Number of entries possibly healing: 0

      
      Brick gfssrv5:/srv/gfs01/Projects

        Status: Connected

        Total Number of entries: 58854

        Number of entries in heal pending: 58854

        Number of entries in split-brain: 0

        Number of entries possibly healing: 0

        
        Brick gfssrv6:/srv/gfs01/Projects

        Status: Connected

        Total Number of entries: 58854

        Number of entries in heal pending: 58854

        Number of entries in split-brain: 0

        Number of entries possibly healing: 0

      
      On 12/27/18 3:09 AM, Ashish Pandey
        wrote:

      
          Hi Brett,

          
          Could you please tell us more about the setup?

          
          1 - Gluster v info

          
          2 - gluster v status

          
          3 - gluster v heal <volname> info

          
          These are the very basic information to start with
            debugging or suggesting any workaround.
          It should always be included when asking such questions
            on mailing list so that people can reply sooner. 

          
          Note: Please hide IP address/hostname or any other
            information you don't want world to see.

          
          ---

          
          Ashish

          
          From: "Brett Holcomb" <biholcomb@xxxxxxxxxx>

            To: gluster-users@xxxxxxxxxxx

            Sent: Thursday, December 27, 2018 12:19:15 AM

            Subject: Re:  Self Heal Confusion

            
            Still no change in the heals pending.  I found this
              reference, https://archive.fosdem.org/2017/schedule/event/glusterselinux/attachments/slides/1876/export/events/attachments/glusterselinux/slides/1876/fosdem.pdf,
              which mentions the default SELinux context for a brick and
              that internal operations such as self-heal, rebalance
              should be ignored. but they do not elaborate on what
              ignore means - is it just not doing self-heal or something
              else.
            I did set SELinux to permissive and nothing changed. 
              I'll try setting the bricks to the context mentioned in
              this pdf and see what happens.
            

            On 12/20/18 8:26 PM, John
              Strunk wrote:

            
              Assuming your bricks are up... yes, the
                heal count should be decreasing.
                

                There is/was a bug wherein self-heal would stop
                  healing but would still be running. I don't know
                  whether your version is affected, but the remedy is to
                  just restart the self-heal daemon.
                Force start one of the volumes that has heals
                  pending. The bricks are already running, but it will
                  cause shd to restart and, assuming this is the
                  problem, healing should begin...
                

                $ gluster vol start my-pending-heal-vol force
                

                Others could better comment on the status of the
                  bug.
                

                -John
                

                On Thu, Dec 20, 2018 at 5:45 PM Brett
                  Holcomb <biholcomb@xxxxxxxxxx>
                  wrote:

                
                I
                  have one volume that has 85 pending entries in healing
                  and two more 

                  volumes with 58,854 entries in healing pending.  These
                  numbers are from 

                  the volume heal info summary command.  They have
                  stayed constant for two 

                  days now.  I've read the gluster docs and many more. 
                  The Gluster docs 

                  just give some commands and non gluster docs basically
                  repeat that.  

                  Given that it appears no self-healing is going on for
                  my volume I am 

                  confused as to why.

                  
                  1.  If a self-heal deamon is listed on a host (all of
                  mine show one with 

                  a volume status command) can I assume it's enabled and
                  running?

                  
                  2.  I assume the volume that has all the self-heals
                  pending has some 

                  serious issues even though I can access the files and
                  directories on 

                  it.  If self-heal is running shouldn't the numbers be
                  decreasing?

                  
                  It appears to me self-heal is not working properly so
                  how to I get it to 

                  start working or should I delete the volume and start
                  over?

                  
                  I'm running gluster 5.2 on Centos 7 latest and
                  updated.

                  
                  Thank you.

                  
                  _______________________________________________

                  Gluster-users mailing list

                  Gluster-users@xxxxxxxxxxx

                  https://lists.gluster.org/mailman/listinfo/gluster-users

                
            _______________________________________________

            Gluster-users mailing list

            Gluster-users@xxxxxxxxxxx

            https://lists.gluster.org/mailman/listinfo/gluster-users
          

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
https://lists.gluster.org/mailman/listinfo/gluster-users