Re: Self healing does not see files to heal

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello Ravi,

Thank you for reply. Found bug number (for those who will google the email) https://bugzilla.redhat.com/show_bug.cgi?id=1112158

Accessing the removed file from mount-point is not always working because we have to find a special client which DHT will point to the brick with removed file. Otherwise the file will be accessed from good brick and self-healing will not happen (just verified). Or by accessing you meant something like touch?

--
Dmitry Glushenok
Jet Infosystems

17 авг. 2016 г., в 4:24, Ravishankar N <ravishankar@xxxxxxxxxx> написал(а):

On 08/16/2016 10:44 PM, Дмитрий Глушенок wrote:
Hello,

While testing healing after bitrot error it was found that self healing cannot heal files which were manually deleted from brick. Gluster 3.8.1:

- Create volume, mount it locally and copy test file to it
[root@srv01 ~]# gluster volume create test01 replica 2  srv01:/R1/test01 srv02:/R1/test01
volume create: test01: success: please start the volume to access data
[root@srv01 ~]# gluster volume start test01
volume start: test01: success
[root@srv01 ~]# mount -t glusterfs srv01:/test01 /mnt
[root@srv01 ~]# cp /etc/passwd /mnt
[root@srv01 ~]# ls -l /mnt
итого 2
-rw-r--r--. 1 root root 1505 авг 16 19:59 passwd

- Then remove test file from first brick like we have to do in case of bitrot error in the file

You also need to remove all hard-links to the corrupted file from the brick, including the one in the .glusterfs folder.
There is a bug in heal-full that prevents it from crawling all bricks of the replica. The right way to heal the corrupted files as of now is to access them from the mount-point like you did after removing the hard-links. The list of files that are corrupted can be obtained with the scrub status command.

Hope this helps,
Ravi

[root@srv01 ~]# rm /R1/test01/passwd
[root@srv01 ~]# ls -l /mnt
итого 0
[root@srv01 ~]#

- Issue full self heal
[root@srv01 ~]# gluster volume heal test01 full
Launching heal operation to perform full self heal on volume test01 has been successful
Use heal info commands to check status
[root@srv01 ~]# tail -2 /var/log/glusterfs/glustershd.log
[2016-08-16 16:59:56.483767] I [MSGID: 108026] [afr-self-heald.c:611:afr_shd_full_healer] 0-test01-replicate-0: starting full sweep on subvol test01-client-0
[2016-08-16 16:59:56.486560] I [MSGID: 108026] [afr-self-heald.c:621:afr_shd_full_healer] 0-test01-replicate-0: finished full sweep on subvol test01-client-0

- Now we still see no files in mount point (it becomes empty right after removing file from the brick)
[root@srv01 ~]# ls -l /mnt
итого 0
[root@srv01 ~]#

- Then try to access file by using full name (lookup-optimize and readdir-optimize are turned off by default). Now glusterfs shows the file!
[root@srv01 ~]# ls -l /mnt/passwd
-rw-r--r--. 1 root root 1505 авг 16 19:59 /mnt/passwd

- And it reappeared in the brick
[root@srv01 ~]# ls -l /R1/test01/
итого 4
-rw-r--r--. 2 root root 1505 авг 16 19:59 passwd
[root@srv01 ~]#

Is it a bug or we can tell self heal to scan all files on all bricks in the volume?

--
Dmitry Glushenok
Jet Infosystems

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users

[Index of Archives]     [Gluster Development]     [Linux Filesytems Development]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux