Re: Self healing does not see files to heal

Дмитрий Глушенок <glush@xxxxxxxxxx> · Wed, 17 Aug 2016 13:18:51 +0300

Unfortunately not:

Remount FS, then access test file from second client:

[root@srv02 ~]# umount /mnt
[root@srv02 ~]# mount -t glusterfs srv01:/test01 /mnt
[root@srv02 ~]# ls -l /mnt/passwd 
-rw-r--r--. 1 root root 1505 авг 16 19:59 /mnt/passwd
[root@srv02 ~]# ls -l /R1/test01/
итого 4
-rw-r--r--. 2 root root 1505 авг 16 19:59 passwd
[root@srv02 ~]# 

Then remount FS and check if accessing the file from second node triggered self-heal on first node:

[root@srv01 ~]# umount /mnt
[root@srv01 ~]# mount -t glusterfs srv01:/test01 /mnt
[root@srv01 ~]# ls -l /mnt
итого 0
[root@srv01 ~]# ls -l /R1/test01/
итого 0
[root@srv01 ~]#

Nothing appeared.

[root@srv01 ~]# gluster volume info test01

Volume Name: test01
Type: Replicate
Volume ID: 2c227085-0b06-4804-805c-ea9c1bb11d8b
Status: Started
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: srv01:/R1/test01
Brick2: srv02:/R1/test01
Options Reconfigured:
features.scrub-freq: hourly
features.scrub: Active
features.bitrot: on
transport.address-family: inet
performance.readdir-ahead: on
nfs.disable: on
[root@srv01 ~]# 

[root@srv01 ~]# gluster volume get test01 all | grep heal
cluster.background-self-heal-count      8                                       
cluster.metadata-self-heal              on                                      
cluster.data-self-heal                  on                                      
cluster.entry-self-heal                 on                                      
cluster.self-heal-daemon                on                                      
cluster.heal-timeout                    600                                     
cluster.self-heal-window-size           1                                       
cluster.data-self-heal-algorithm        (null)                                  
cluster.self-heal-readdir-size          1KB                                     
cluster.heal-wait-queue-length          128                                     
features.lock-heal                      off                                     
features.lock-heal                      off                                     
storage.health-check-interval           30                                      
features.ctr_lookupheal_link_timeout    300                                     
features.ctr_lookupheal_inode_timeout   300                                     
cluster.disperse-self-heal-daemon       enable                                  
disperse.background-heals               8                                       
disperse.heal-wait-qlength              128                                     
cluster.heal-timeout                    600                                     
cluster.granular-entry-heal             no                                      
[root@srv01 ~]#

--
Dmitry Glushenok
Jet Infosystems

17 авг. 2016 г., в 11:30, Ravishankar N <ravishankar@xxxxxxxxxx> написал(а):

    On 08/17/2016 01:48 PM, Дмитрий
      Глушенок wrote:

      Hello Ravi,

      Thank you for reply. Found bug number (for those who
        will google the email) https://bugzilla.redhat.com/show_bug.cgi?id=1112158

      Accessing the removed file from mount-point is not
        always working because we have to find a special client which
        DHT will point to the brick with removed file. Otherwise the
        file will be accessed from good brick and self-healing will not
        happen (just verified). Or by accessing you meant something like
        touch?

    Sorry should have been more explicit. I meant triggering a lookup on
    that file with `stat filename`. I don't think you need a special
    client. DHT sends the lookup to AFR which in turn sends to all its
    children. When one of them returns ENOENT (because you removed it
    from the brick), AFR will automatically trigger heal. I'm guessing
    it is not always working in your case due to caching at various
    levels and the lookup not coming till AFR. If you do it from a fresh
    mount ,it should always work.

    -Ravi

                              Dmitry Glushenok
                              Jet Infosystems

          17 авг. 2016 г., в 4:24, Ravishankar N <ravishankar@xxxxxxxxxx>
            написал(а):

          On
              08/16/2016 10:44 PM, Дмитрий Глушенок wrote:

            Hello,

              While testing healing after bitrot error it was found that
              self healing cannot heal files which were manually deleted
              from brick. Gluster 3.8.1:

              - Create volume, mount it locally and copy test file to it

              [root@srv01 ~]# gluster volume create test01 replica 2
               srv01:/R1/test01 srv02:/R1/test01

              volume create: test01: success: please start the volume to
              access data

              [root@srv01 ~]# gluster volume start test01

              volume start: test01: success

              [root@srv01 ~]# mount -t glusterfs srv01:/test01 /mnt

              [root@srv01 ~]# cp /etc/passwd /mnt

              [root@srv01 ~]# ls -l /mnt

              итого 2

              -rw-r--r--. 1 root root 1505 авг 16 19:59 passwd

              - Then remove test file from first brick like we have to
              do in case of bitrot error in the file

            You also need
              to remove all hard-links to the corrupted file from the
              brick, including the one in the .glusterfs folder.

            There is a bug
              in heal-full that prevents it from crawling all bricks of
              the replica. The right way to heal the corrupted files as
              of now is to access them from the mount-point like you did
              after removing the hard-links. The list of files that are
              corrupted can be obtained with the scrub status command.

            Hope this
              helps,

            Ravi

            [root@srv01 ~]# rm /R1/test01/passwd

              [root@srv01 ~]# ls -l /mnt

              итого 0

              [root@srv01 ~]#

              - Issue full self heal

              [root@srv01 ~]# gluster volume heal test01 full

              Launching heal operation to perform full self heal on
              volume test01 has been successful

              Use heal info commands to check status

              [root@srv01 ~]# tail -2 /var/log/glusterfs/glustershd.log

              [2016-08-16 16:59:56.483767] I [MSGID: 108026]
              [afr-self-heald.c:611:afr_shd_full_healer]
              0-test01-replicate-0: starting full sweep on subvol
              test01-client-0

              [2016-08-16 16:59:56.486560] I [MSGID: 108026]
              [afr-self-heald.c:621:afr_shd_full_healer]
              0-test01-replicate-0: finished full sweep on subvol
              test01-client-0

              - Now we still see no files in mount point (it becomes
              empty right after removing file from the brick)

              [root@srv01 ~]# ls -l /mnt

              итого 0

              [root@srv01 ~]#

              - Then try to access file by using full name
              (lookup-optimize and readdir-optimize are turned off by
              default). Now glusterfs shows the file!

              [root@srv01 ~]# ls -l /mnt/passwd

              -rw-r--r--. 1 root root 1505 авг 16 19:59 /mnt/passwd

              - And it reappeared in the brick

              [root@srv01 ~]# ls -l /R1/test01/

              итого 4

              -rw-r--r--. 2 root root 1505 авг 16 19:59 passwd

              [root@srv01 ~]#

              Is it a bug or we can tell self heal to scan all files on
              all bricks in the volume?

              --

              Dmitry Glushenok

              Jet Infosystems

              _______________________________________________

              Gluster-users mailing list

              Gluster-users@xxxxxxxxxxx

              http://www.gluster.org/mailman/listinfo/gluster-users

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users