As suggested I have now opened a bug on bugzilla:
-------- Original Message --------Subject: Re: [Gluster-users] self-heal not workingLocal Time: August 28, 2017 4:29 PMUTC Time: August 28, 2017 2:29 PMFrom: ravishankar@xxxxxxxxxxTo: mabi <mabi@xxxxxxxxxxxxx>Ben Turner <bturner@xxxxxxxxxx>, Gluster Users <gluster-users@xxxxxxxxxxx>Great, can you raise a bug for the issue so that it is easier to keep track (plus you'll be notified if the patch is posted) of it? The general guidelines are @ https://gluster.readthedocs.io/en/latest/Contributors-Guide/Bug-Reporting-Guidelines but you just need to provide whatever you described in this email thread in the bug:i.e. volume info, heal info, getfattr and stat output of the file in question.Thanks!RaviOn 08/28/2017 07:49 PM, mabi wrote:Thank you for the command. I ran it on all my nodes and now finally the the self-heal daemon does not report any files to be healed. Hopefully this scenario can get handled properly in newer versions of GlusterFS.-------- Original Message --------Subject: Re: [Gluster-users] self-heal not workingLocal Time: August 28, 2017 10:41 AMUTC Time: August 28, 2017 8:41 AMFrom: ravishankar@xxxxxxxxxxTo: mabi <mabi@xxxxxxxxxxxxx>
On 08/28/2017 01:29 PM, mabi wrote:Excuse me for my naive questions but how do I reset the afr.dirty xattr on the file to be healed? and do I need to do that through a FUSE mount? or simply on every bricks directly?Directly on the bricks: `setfattr -n trusted.afr.dirty -v 0x000000000000000000000000 /data/myvolume/brick/data/appdata_ocpom4nckwru/preview/1344699/64-64-crop.png`-Ravi-------- Original Message --------Subject: Re: [Gluster-users] self-heal not workingLocal Time: August 28, 2017 5:58 AMUTC Time: August 28, 2017 3:58 AMFrom: ravishankar@xxxxxxxxxxGluster Users <gluster-users@xxxxxxxxxxx>On 08/28/2017 01:57 AM, Ben Turner wrote:> ----- Original Message ----->> From: "mabi" <mabi@xxxxxxxxxxxxx>>> To: "Ravishankar N" <ravishankar@xxxxxxxxxx>>> Sent: Sunday, August 27, 2017 3:15:33 PM>> Subject: Re: [Gluster-users] self-heal not working>>>> Thanks Ravi for your analysis. So as far as I understand nothing to worry>> about but my question now would be: how do I get rid of this file from the>> heal info?> Correct me if I am wrong but clearing this is just a matter of resetting the afr.dirty xattr? @Ravi - Is this correct?Yes resetting the xattr and launching index heal or running heal-infocommand should serve as a workaround.-Ravi>> -b>>>> -------- Original Message -------->>> Subject: Re: [Gluster-users] self-heal not working>>> Local Time: August 27, 2017 3:45 PM>>> UTC Time: August 27, 2017 1:45 PM>>> From: ravishankar@xxxxxxxxxx>>> To: mabi <mabi@xxxxxxxxxxxxx>>>>>>> Yes, the shds did pick up the file for healing (I saw messages like " got>>> entry: 1985e233-d5ee-4e3e-a51a-cf0b5f9f2aea") but no error afterwards.>>>>>> Anyway I reproduced it by manually setting the afr.dirty bit for a zero>>> byte file on all 3 bricks. Since there are no afr pending xattrs>>> indicating good/bad copies and all files are zero bytes, the data>>> self-heal algorithm just picks the file with the latest ctime as source.>>> In your case that was the arbiter brick. In the code, there is a check to>>> prevent data heals if arbiter is the source. So heal was not happening and>>> the entries were not removed from heal-info output.>>>>>> Perhaps we should add a check in the code to just remove the entries from>>> heal-info if size is zero bytes in all bricks.>>>>>> -Ravi>>>>>> On 08/25/2017 06:33 PM, mabi wrote:>>>>>>> Hi Ravi,>>>>>>>> Did you get a chance to have a look at the log files I have attached in my>>>> last mail?>>>>>>>> Best,>>>> Mabi>>>>>>>>> -------- Original Message -------->>>>> Subject: Re: [Gluster-users] self-heal not working>>>>> Local Time: August 24, 2017 12:08 PM>>>>> UTC Time: August 24, 2017 10:08 AM>>>>> From: mabi@xxxxxxxxxxxxx>>>>> To: Ravishankar N>>>>>>>>>> Thanks for confirming the command. I have now enabled DEBUG>>>>> client-log-level, run a heal and then attached the glustershd log files>>>>> of all 3 nodes in this mail.>>>>>>>>>> The volume concerned is called myvol-pro, the other 3 volumes have no>>>>> problem so far.>>>>>>>>>> Also note that in the mean time it looks like the file has been deleted>>>>> by the user and as such the heal info command does not show the file>>>>> name anymore but just is GFID which is:>>>>>>>>>> gfid:1985e233-d5ee-4e3e-a51a-cf0b5f9f2aea>>>>>>>>>> Hope that helps for debugging this issue.>>>>>>>>>>> -------- Original Message -------->>>>>> Subject: Re: [Gluster-users] self-heal not working>>>>>> Local Time: August 24, 2017 5:58 AM>>>>>> UTC Time: August 24, 2017 3:58 AM>>>>>> From: ravishankar@xxxxxxxxxx>>>>>>>>>>>> Unlikely. In your case only the afr.dirty is set, not the>>>>>> afr.volname-client-xx xattr.>>>>>>>>>>>> `gluster volume set myvolume diagnostics.client-log-level DEBUG` is>>>>>> right.>>>>>>>>>>>> On 08/23/2017 10:31 PM, mabi wrote:>>>>>>>>>>>>> I just saw the following bug which was fixed in 3.8.15:>>>>>>>>>>>>>>>>>>>>> Is it possible that the problem I described in this post is related to>>>>>>> that bug?>>>>>>>>>>>>>>> -------- Original Message -------->>>>>>>> Subject: Re: [Gluster-users] self-heal not working>>>>>>>> Local Time: August 22, 2017 11:51 AM>>>>>>>> UTC Time: August 22, 2017 9:51 AM>>>>>>>> From: ravishankar@xxxxxxxxxx>>>>>>>>>>>>>>>> On 08/22/2017 02:30 PM, mabi wrote:>>>>>>>>>>>>>>>>> Thanks for the additional hints, I have the following 2 questions>>>>>>>>> first:>>>>>>>>>>>>>>>>>> - In order to launch the index heal is the following command correct:>>>>>>>>> gluster volume heal myvolume>>>>>>>> Yes>>>>>>>>>>>>>>>>> - If I run a "volume start force" will it have any short disruptions>>>>>>>>> on my clients which mount the volume through FUSE? If yes, how long?>>>>>>>>> This is a production system that"s why I am asking.>>>>>>>> No. You can actually create a test volume on your personal linux box>>>>>>>> to try these kinds of things without needing multiple machines. This>>>>>>>> is how we develop and test our patches :)>>>>>>>> "gluster volume create testvol replica 3 /home/mabi/bricks/brick{1..3}>>>>>>>> force` and so on.>>>>>>>>>>>>>>>> HTH,>>>>>>>> Ravi>>>>>>>>>>>>>>>>>> -------- Original Message -------->>>>>>>>>> Subject: Re: [Gluster-users] self-heal not working>>>>>>>>>> Local Time: August 22, 2017 6:26 AM>>>>>>>>>> UTC Time: August 22, 2017 4:26 AM>>>>>>>>>> From: ravishankar@xxxxxxxxxx>>>>>>>>>> Gluster Users>>>>>>>>>>>>>>>>>>>> Explore the following:>>>>>>>>>>>>>>>>>>>> - Launch index heal and look at the glustershd logs of all bricks>>>>>>>>>> for possible errors>>>>>>>>>>>>>>>>>>>> - See if the glustershd in each node is connected to all bricks.>>>>>>>>>>>>>>>>>>>> - If not try to restart shd by `volume start force`>>>>>>>>>>>>>>>>>>>> - Launch index heal again and try.>>>>>>>>>>>>>>>>>>>> - Try debugging the shd log by setting client-log-level to DEBUG>>>>>>>>>> temporarily.>>>>>>>>>>>>>>>>>>>> On 08/22/2017 03:19 AM, mabi wrote:>>>>>>>>>>>>>>>>>>>>> Sure, it doesn"t look like a split brain based on the output:>>>>>>>>>>>>>>>>>>>>>> Brick node1.domain.tld:/data/myvolume/brick>>>>>>>>>>> Status: Connected>>>>>>>>>>> Number of entries in split-brain: 0>>>>>>>>>>>>>>>>>>>>>> Brick node2.domain.tld:/data/myvolume/brick>>>>>>>>>>> Status: Connected>>>>>>>>>>> Number of entries in split-brain: 0>>>>>>>>>>>>>>>>>>>>>> Brick node3.domain.tld:/srv/glusterfs/myvolume/brick>>>>>>>>>>> Status: Connected>>>>>>>>>>> Number of entries in split-brain: 0>>>>>>>>>>>>>>>>>>>>>>> -------- Original Message -------->>>>>>>>>>>> Subject: Re: [Gluster-users] self-heal not working>>>>>>>>>>>> Local Time: August 21, 2017 11:35 PM>>>>>>>>>>>> UTC Time: August 21, 2017 9:35 PM>>>>>>>>>>>> From: bturner@xxxxxxxxxx>>>>>>>>>>>> Gluster Users>>>>>>>>>>>>>>>>>>>>>>>> Can you also provide:>>>>>>>>>>>>>>>>>>>>>>>> gluster v heal <my vol> info split-brain>>>>>>>>>>>>>>>>>>>>>>>> If it is split brain just delete the incorrect file from the brick>>>>>>>>>>>> and run heal again. I haven"t tried this with arbiter but I>>>>>>>>>>>> assume the process is the same.>>>>>>>>>>>>>>>>>>>>>>>> -b>>>>>>>>>>>>>>>>>>>>>>>> ----- Original Message ----->>>>>>>>>>>>> To: "Ben Turner">>>>>>>>>>>>> Cc: "Gluster Users">>>>>>>>>>>>> Sent: Monday, August 21, 2017 4:55:59 PM>>>>>>>>>>>>> Subject: Re: [Gluster-users] self-heal not working>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Ben,>>>>>>>>>>>>>>>>>>>>>>>>>> So it is really a 0 kBytes file everywhere (all nodes including>>>>>>>>>>>>> the arbiter>>>>>>>>>>>>> and from the client).>>>>>>>>>>>>> Here below you will find the output you requested. Hopefully that>>>>>>>>>>>>> will help>>>>>>>>>>>>> to find out why this specific file is not healing... Let me know>>>>>>>>>>>>> if you need>>>>>>>>>>>>> any more information. Btw node3 is my arbiter node.>>>>>>>>>>>>>>>>>>>>>>>>>> NODE1:>>>>>>>>>>>>>>>>>>>>>>>>>> STAT:>>>>>>>>>>>>> File:>>>>>>>>>>>>> ‘/data/myvolume/brick/data/appdata_ocpom4nckwru/preview/1344699/64-64-crop.png’>>>>>>>>>>>>> Size: 0 Blocks: 38 IO Block: 131072 regular empty file>>>>>>>>>>>>> Device: 24h/36d Inode: 10033884 Links: 2>>>>>>>>>>>>> Access: (0644/-rw-r--r--) Uid: ( 33/www-data) Gid: ( 33/www-data)>>>>>>>>>>>>> Access: 2017-08-14 17:04:55.530681000 +0200>>>>>>>>>>>>> Modify: 2017-08-14 17:11:46.407404779 +0200>>>>>>>>>>>>> Change: 2017-08-14 17:11:46.407404779 +0200>>>>>>>>>>>>> Birth: ->>>>>>>>>>>>>>>>>>>>>>>>>> GETFATTR:>>>>>>>>>>>>> trusted.afr.dirty=0sAAAAAQAAAAAAAAAA>>>>>>>>>>>>> trusted.bit-rot.version=0sAgAAAAAAAABZhuknAAlJAg==>>>>>>>>>>>>> trusted.gfid=0sGYXiM9XuTj6lGs8LX58q6g==>>>>>>>>>>>>> trusted.glusterfs.d99af2fa-439b-4a21-bf3a-38f3849f87ec.xtime=0sWZG9sgAGOyo=>>>>>>>>>>>>>>>>>>>>>>>>>> NODE2:>>>>>>>>>>>>>>>>>>>>>>>>>> STAT:>>>>>>>>>>>>> File:>>>>>>>>>>>>> ‘/data/myvolume/brick/data/appdata_ocpom4nckwru/preview/1344699/64-64-crop.png’>>>>>>>>>>>>> Size: 0 Blocks: 38 IO Block: 131072 regular empty file>>>>>>>>>>>>> Device: 26h/38d Inode: 10031330 Links: 2>>>>>>>>>>>>> Access: (0644/-rw-r--r--) Uid: ( 33/www-data) Gid: ( 33/www-data)>>>>>>>>>>>>> Access: 2017-08-14 17:04:55.530681000 +0200>>>>>>>>>>>>> Modify: 2017-08-14 17:11:46.403704181 +0200>>>>>>>>>>>>> Change: 2017-08-14 17:11:46.403704181 +0200>>>>>>>>>>>>> Birth: ->>>>>>>>>>>>>>>>>>>>>>>>>> GETFATTR:>>>>>>>>>>>>> trusted.afr.dirty=0sAAAAAQAAAAAAAAAA>>>>>>>>>>>>> trusted.bit-rot.version=0sAgAAAAAAAABZhu6wAA8Hpw==>>>>>>>>>>>>> trusted.gfid=0sGYXiM9XuTj6lGs8LX58q6g==>>>>>>>>>>>>> trusted.glusterfs.d99af2fa-439b-4a21-bf3a-38f3849f87ec.xtime=0sWZG9sgAGOVE=>>>>>>>>>>>>>>>>>>>>>>>>>> NODE3:>>>>>>>>>>>>> STAT:>>>>>>>>>>>>> File:>>>>>>>>>>>>> /srv/glusterfs/myvolume/brick/data/appdata_ocpom4nckwru/preview/1344699/64-64-crop.png>>>>>>>>>>>>> Size: 0 Blocks: 0 IO Block: 4096 regular empty file>>>>>>>>>>>>> Device: ca11h/51729d Inode: 405208959 Links: 2>>>>>>>>>>>>> Access: (0644/-rw-r--r--) Uid: ( 33/www-data) Gid: ( 33/www-data)>>>>>>>>>>>>> Access: 2017-08-14 17:04:55.530681000 +0200>>>>>>>>>>>>> Modify: 2017-08-14 17:04:55.530681000 +0200>>>>>>>>>>>>> Change: 2017-08-14 17:11:46.604380051 +0200>>>>>>>>>>>>> Birth: ->>>>>>>>>>>>>>>>>>>>>>>>>> GETFATTR:>>>>>>>>>>>>> trusted.afr.dirty=0sAAAAAQAAAAAAAAAA>>>>>>>>>>>>> trusted.bit-rot.version=0sAgAAAAAAAABZe6ejAAKPAg==>>>>>>>>>>>>> trusted.gfid=0sGYXiM9XuTj6lGs8LX58q6g==>>>>>>>>>>>>> trusted.glusterfs.d99af2fa-439b-4a21-bf3a-38f3849f87ec.xtime=0sWZG9sgAGOc4=>>>>>>>>>>>>>>>>>>>>>>>>>> CLIENT GLUSTER MOUNT:>>>>>>>>>>>>> STAT:>>>>>>>>>>>>> File:>>>>>>>>>>>>> "/mnt/myvolume/data/appdata_ocpom4nckwru/preview/1344699/64-64-crop.png">>>>>>>>>>>>> Size: 0 Blocks: 0 IO Block: 131072 regular empty file>>>>>>>>>>>>> Device: 1eh/30d Inode: 11897049013408443114 Links: 1>>>>>>>>>>>>> Access: (0644/-rw-r--r--) Uid: ( 33/www-data) Gid: ( 33/www-data)>>>>>>>>>>>>> Access: 2017-08-14 17:04:55.530681000 +0200>>>>>>>>>>>>> Modify: 2017-08-14 17:11:46.407404779 +0200>>>>>>>>>>>>> Change: 2017-08-14 17:11:46.407404779 +0200>>>>>>>>>>>>> Birth: ->>>>>>>>>>>>>>>>>>>>>>>>>>> -------- Original Message -------->>>>>>>>>>>>>> Subject: Re: [Gluster-users] self-heal not working>>>>>>>>>>>>>> Local Time: August 21, 2017 9:34 PM>>>>>>>>>>>>>> UTC Time: August 21, 2017 7:34 PM>>>>>>>>>>>>>> From: bturner@xxxxxxxxxx>>>>>>>>>>>>>> Gluster Users>>>>>>>>>>>>>>>>>>>>>>>>>>>> ----- Original Message ----->>>>>>>>>>>>>>> To: "Gluster Users">>>>>>>>>>>>>>> Sent: Monday, August 21, 2017 9:28:24 AM>>>>>>>>>>>>>>> Subject: [Gluster-users] self-heal not working>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi,>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I have a replicat 2 with arbiter GlusterFS 3.8.11 cluster and>>>>>>>>>>>>>>> there is>>>>>>>>>>>>>>> currently one file listed to be healed as you can see below>>>>>>>>>>>>>>> but never gets>>>>>>>>>>>>>>> healed by the self-heal daemon:>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Brick node1.domain.tld:/data/myvolume/brick>>>>>>>>>>>>>>> /data/appdata_ocpom4nckwru/preview/1344699/64-64-crop.png>>>>>>>>>>>>>>> Status: Connected>>>>>>>>>>>>>>> Number of entries: 1>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Brick node2.domain.tld:/data/myvolume/brick>>>>>>>>>>>>>>> /data/appdata_ocpom4nckwru/preview/1344699/64-64-crop.png>>>>>>>>>>>>>>> Status: Connected>>>>>>>>>>>>>>> Number of entries: 1>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Brick node3.domain.tld:/srv/glusterfs/myvolume/brick>>>>>>>>>>>>>>> /data/appdata_ocpom4nckwru/preview/1344699/64-64-crop.png>>>>>>>>>>>>>>> Status: Connected>>>>>>>>>>>>>>> Number of entries: 1>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> As once recommended on this mailing list I have mounted that>>>>>>>>>>>>>>> glusterfs>>>>>>>>>>>>>>> volume>>>>>>>>>>>>>>> temporarily through fuse/glusterfs and ran a "stat" on that>>>>>>>>>>>>>>> file which is>>>>>>>>>>>>>>> listed above but nothing happened.>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> The file itself is available on all 3 nodes/bricks but on the>>>>>>>>>>>>>>> last node it>>>>>>>>>>>>>>> has a different date. By the way this file is 0 kBytes big. Is>>>>>>>>>>>>>>> that maybe>>>>>>>>>>>>>>> the reason why the self-heal does not work?>>>>>>>>>>>>>> Is the file actually 0 bytes or is it just 0 bytes on the>>>>>>>>>>>>>> arbiter(0 bytes>>>>>>>>>>>>>> are expected on the arbiter, it just stores metadata)? Can you>>>>>>>>>>>>>> send us the>>>>>>>>>>>>>> output from stat on all 3 nodes:>>>>>>>>>>>>>>>>>>>>>>>>>>>> $ stat <file on back end brick>>>>>>>>>>>>>>> $ getfattr -d -m - <file on back end brick>>>>>>>>>>>>>>> $ stat <file from gluster mount>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Lets see what things look like on the back end, it should tell>>>>>>>>>>>>>> us why>>>>>>>>>>>>>> healing is failing.>>>>>>>>>>>>>>>>>>>>>>>>>>>> -b>>>>>>>>>>>>>>>>>>>>>>>>>>>>> And how can I now make this file to heal?>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks,>>>>>>>>>>>>>>> Mabi>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> _______________________________________________>>>>>>>>>>>>>>> Gluster-users mailing list>>>>>>>>>>>>>>> Gluster-users@xxxxxxxxxxx>>>>>>>>>>>>>>> http://lists.gluster.org/mailman/listinfo/gluster-users>>>>>>>>>>> _______________________________________________>>>>>>>>>>> Gluster-users mailing list>>>>>>>>>>> Gluster-users@xxxxxxxxxxx>>>>>>>>>>>
_______________________________________________ Gluster-users mailing list Gluster-users@xxxxxxxxxxx http://lists.gluster.org/mailman/listinfo/gluster-users