Re: Self Heal Confusion

Brett Holcomb <biholcomb@xxxxxxxxxx> · Fri, 28 Dec 2018 20:29:58 -0500



    I've been trying to find the file name from guid with references
      such as
      https://docs.gluster.org/en/latest/Troubleshooting/gfid-to-path/,
      the script I referenced, and other ways but no luck.  The Guid in
      the command below does not exist in the directory
      /srv/gfs01/Projects/.glusterfs/63/5a.  Other files with a GUID for
      a name exist.
    It appears the files do not exist.  In addition the file that is
      in the 63/5a directory  shows a link to a file that does not
      exist.

    
    On 12/28/18 6:23 PM, Brett Holcomb
      wrote:

    
      I've done step 1 with no results yet so I'm trying step 2 but
        can't find the file via the GFID name.  The gluster volume heal
        projects info output is in a text file so I grabbed the first
        entry from the file for Brick gfssrv1:/srv/gfs01/Projects which
        is listed a
      <gfid:the long gfid>
      I then tried to use this method here, https://access.redhat.com/documentation/en-us/red_hat_gluster_storage/3.2/html/administration_guide/ch,
        to find the  file.  However when I do the mount there is no
        .gfid directory anywhere.
      I then used the Gluster GFID resolver from here, https://gist.github.com/semiosis/4392640,
        and that gives me this output which has no file linked to it.
      [root@srv-1-gfs1 ~]# ./gfid-resolver.sh /srv/gfs01/Projects
        6e5ab8ae-65f4-4594-9313-3483bf031adc

        6e5ab8ae-65f4-4594-9313-3483bf031adc    ==      File:

        Done.

      
      So at this point either I'm doing something wrong (most likely)
        or the files do not exist. I've tried this on several files.
      

      On 12/28/18 1:00 AM, Ashish Pandey
        wrote:

      
          Hi Brett,

          
          First the answers of all your questions - 

          
          1.  If a self-heal deamon is listed on a host (all of
            mine show one with 

            a volume status command) can I assume it's enabled and
            running?

          
          For your volume, projects self heal daemon is UP and
            running

          
            2.  I assume the volume that has all the self-heals pending
            has some 

            serious issues even though I can access the files and
            directories on 

            it.  If self-heal is running shouldn't the numbers be
            decreasing?
          
            
            It should heal the entries and the number of entries
              coming in "gluster v heal volname info" command should be
              decreasing.

            
            It appears to me self-heal is not working properly so how to
            I get it to 

            start working or should I delete the volume and start over?
          

          As you can access all the files from mount point, I think
            the volume and the files are in good state as of now.

          
          I don't think you should think of deleting your volume
            before trying to fix it.

          
          If there is no fix or the fix is taking time you can go
            ahead with that option.

          
          -----------------------

          
          Why all these options are off? 

          
          performance.quick-read: off

            performance.parallel-readdir: off

            performance.readdir-ahead: off

            performance.write-behind: off

            performance.read-ahead: off
          

          Although this should not matter to your issue but I think
            you should enable all the above unless you have a reason to
            not to do so.

          
          --------------------

          
          I would like you to perform following steps and provide
            some more information - 

          
          1 - Try to restart self heal and see if that works. 

          
          "gluster v start volume force" will kill and restart the
            self heal processes.

          
          2 - If step 1 is not fruitful, get the list of entries
            need to be healed and pick one of the entry to heal. I mean
            we should focus on one entry to find out why it is 

          
          not getting healed instead of all the 5900 entries. Let's
            call it entry1.

          
          3 -  Now access the entry1 from mount point, read, write
            on it and see if this entry has been healed. Check for heal
            info. Accessing file from mount point triggers client side
            heal

          
          which could also heal the file.

          
          4 - Check for the logs in /var/log/gluster, mount logs
            and glustershd logs should be checked and provided.

          
          5 -  Get the external attributes of entry1 from all the
            bricks.

          
          If the path of the entry1 on mount point is /a/b/c/entry1
            then you have to run following command on all the nodes - 

          
          getfattr -m. -d -e hex <path of the brick on the
            node>/a/b/c/entry1

          
          Please provide the output of above command too.

          
          ---

          
          Ashish

          
          From: "Brett Holcomb" <biholcomb@xxxxxxxxxx>

            To: gluster-users@xxxxxxxxxxx

            Sent: Friday, December 28, 2018 3:49:50 AM

            Subject: Re:  Self Heal Confusion

            
            Resend as I did not reply to the list earlier.  TBird
              responded to the poster and not the list.

            
            On 12/27/18 11:46 AM, Brett
              Holcomb wrote:

            
              Thank you. I appreciate the help  Here is the
                information.  Let me know if you need anything else. 
                I'm fairly new to gluster.

              
              Gluster version is 5.2
              1. gluster v info

              
              Volume Name: projects

                Type: Distributed-Replicate

                Volume ID: 5aac71aa-feaa-44e9-a4f9-cb4dd6e0fdc3

                Status: Started

                Snapshot Count: 0

                Number of Bricks: 2 x 3 = 6

                Transport-type: tcp

                Bricks:

                Brick1: gfssrv1:/srv/gfs01/Projects

                Brick2: gfssrv2:/srv/gfs01/Projects

                Brick3: gfssrv3:/srv/gfs01/Projects

                Brick4: gfssrv4:/srv/gfs01/Projects

                Brick5: gfssrv5:/srv/gfs01/Projects

                Brick6: gfssrv6:/srv/gfs01/Projects

                Options Reconfigured:

                cluster.self-heal-daemon: enable

                performance.quick-read: off

                performance.parallel-readdir: off

                performance.readdir-ahead: off

                performance.write-behind: off

                performance.read-ahead: off

                performance.client-io-threads: off

                nfs.disable: on

                transport.address-family: inet

                server.allow-insecure: on

                storage.build-pgfid: on

                changelog.changelog: on

                changelog.capture-del-path: on

                
              2.  gluster v status

              
              Status of volume: projects

                Gluster process                             TCP Port 
                RDMA Port  Online  Pid

------------------------------------------------------------------------------

                Brick gfssrv1:/srv/gfs01/Projects           49154    
                0          Y       7213 

                Brick gfssrv2:/srv/gfs01/Projects           49154    
                0          Y       6932 

                Brick gfssrv3:/srv/gfs01/Projects           49154    
                0          Y       6920 

                Brick gfssrv4:/srv/gfs01/Projects           49154    
                0          Y       6732 

                Brick gfssrv5:/srv/gfs01/Projects           49154    
                0          Y       6950 

                Brick gfssrv6:/srv/gfs01/Projects           49154    
                0          Y       6879 

                Self-heal Daemon on localhost               N/A      
                N/A        Y       11484

                Self-heal Daemon on gfssrv2                 N/A      
                N/A        Y       10366

                Self-heal Daemon on gfssrv4                 N/A      
                N/A        Y       9872 

                Self-heal Daemon on srv-1-gfs3.corp.l1049h.

                net                                         N/A      
                N/A        Y       9892 

                Self-heal Daemon on gfssrv6                 N/A      
                N/A        Y       10372

                Self-heal Daemon on gfssrv5                 N/A      
                N/A        Y       10761

                 
                Task Status of Volume projects

------------------------------------------------------------------------------

                There are no active volume tasks
              3. I've given the summary since the actual list for two
                volumes is around 5900 entries.
              Brick gfssrv1:/srv/gfs01/Projects

                Status: Connected

                Total Number of entries: 85

                Number of entries in heal pending: 85

                Number of entries in split-brain: 0

                Number of entries possibly healing: 0

                
                Brick gfssrv2:/srv/gfs01/Projects

                Status: Connected

                Total Number of entries: 0

                Number of entries in heal pending: 0

                Number of entries in split-brain: 0

                Number of entries possibly healing: 0

                
                Brick gfssrv3:/srv/gfs01/Projects

                Status: Connected

                Total Number of entries: 0

                Number of entries in heal pending: 0

                Number of entries in split-brain: 0

                Number of entries possibly healing: 0

                
                Brick gfssrv4:/srv/gfs01/Projects

                Status: Connected

                Total Number of entries: 0

                Number of entries in heal pending: 0

                Number of entries in split-brain: 0

                Number of entries possibly healing: 0

              
              Brick gfssrv5:/srv/gfs01/Projects

                Status: Connected

                Total Number of entries: 58854

                Number of entries in heal pending: 58854

                Number of entries in split-brain: 0

                Number of entries possibly healing: 0

                
                Brick gfssrv6:/srv/gfs01/Projects

                Status: Connected

                Total Number of entries: 58854

                Number of entries in heal pending: 58854

                Number of entries in split-brain: 0

                Number of entries possibly healing: 0

              
              On 12/27/18 3:09 AM, Ashish
                Pandey wrote:

              
                  Hi Brett,

                  
                  Could you please tell us more about the setup?

                  
                  1 - Gluster v info

                  
                  2 - gluster v status

                  
                  3 - gluster v heal <volname> info

                  
                  These are the very basic information to start
                    with debugging or suggesting any workaround.
                  It should always be included when asking such
                    questions on mailing list so that people can reply
                    sooner. 

                  
                  Note: Please hide IP address/hostname or any
                    other information you don't want world to see.

                  
                  ---

                  
                  Ashish

                  
                  From: "Brett Holcomb" <biholcomb@xxxxxxxxxx>

                    To: gluster-users@xxxxxxxxxxx

                    Sent: Thursday, December 27, 2018 12:19:15
                    AM

                    Subject: Re:  Self Heal
                    Confusion

                    
                    Still no change in the heals pending.  I found
                      this reference, https://archive.fosdem.org/2017/schedule/event/glusterselinux/attachments/slides/1876/export/events/attachments/glusterselinux/slides/1876/fosdem.pdf,
                      which mentions the default SELinux context for a
                      brick and that internal operations such as
                      self-heal, rebalance should be ignored. but they
                      do not elaborate on what ignore means - is it just
                      not doing self-heal or something else.
                    I did set SELinux to permissive and nothing
                      changed.  I'll try setting the bricks to the
                      context mentioned in this pdf and see what
                      happens.
                    

                    On 12/20/18 8:26 PM,
                      John Strunk wrote:

                    
                      Assuming your bricks are up... yes,
                        the heal count should be decreasing.
                        

                        There is/was a bug wherein self-heal would
                          stop healing but would still be running. I
                          don't know whether your version is affected,
                          but the remedy is to just restart the
                          self-heal daemon.
                        Force start one of the volumes that has
                          heals pending. The bricks are already running,
                          but it will cause shd to restart and, assuming
                          this is the problem, healing should begin...
                        

                        $ gluster vol start my-pending-heal-vol
                          force
                        

                        Others could better comment on the status
                          of the bug.
                        

                        -John
                        

                        On Thu, Dec 20, 2018 at 5:45 PM
                          Brett Holcomb <biholcomb@xxxxxxxxxx>
                          wrote:

                        
                        I have one
                          volume that has 85 pending entries in healing
                          and two more 

                          volumes with 58,854 entries in healing
                          pending.  These numbers are from 

                          the volume heal info summary command.  They
                          have stayed constant for two 

                          days now.  I've read the gluster docs and many
                          more.  The Gluster docs 

                          just give some commands and non gluster docs
                          basically repeat that.  

                          Given that it appears no self-healing is going
                          on for my volume I am 

                          confused as to why.

                          
                          1.  If a self-heal deamon is listed on a host
                          (all of mine show one with 

                          a volume status command) can I assume it's
                          enabled and running?

                          
                          2.  I assume the volume that has all the
                          self-heals pending has some 

                          serious issues even though I can access the
                          files and directories on 

                          it.  If self-heal is running shouldn't the
                          numbers be decreasing?

                          
                          It appears to me self-heal is not working
                          properly so how to I get it to 

                          start working or should I delete the volume
                          and start over?

                          
                          I'm running gluster 5.2 on Centos 7 latest and
                          updated.

                          
                          Thank you.

                          
_______________________________________________

                          Gluster-users mailing list

                          Gluster-users@xxxxxxxxxxx

                          https://lists.gluster.org/mailman/listinfo/gluster-users

                        
                    _______________________________________________

                    Gluster-users mailing list

                    Gluster-users@xxxxxxxxxxx

                    https://lists.gluster.org/mailman/listinfo/gluster-users

                  
            _______________________________________________

            Gluster-users mailing list

            Gluster-users@xxxxxxxxxxx

            https://lists.gluster.org/mailman/listinfo/gluster-users
          

      _______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
https://lists.gluster.org/mailman/listinfo/gluster-users
    
  
_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
https://lists.gluster.org/mailman/listinfo/gluster-users