Re: self-heal not working

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



As suggested I have now opened a bug on bugzilla:
https://bugzilla.redhat.com/show_bug.cgi?id=1486063



-------- Original Message --------
Subject: Re: [Gluster-users] self-heal not working
Local Time: August 28, 2017 4:29 PM
UTC Time: August 28, 2017 2:29 PM
From: ravishankar@xxxxxxxxxx
To: mabi <mabi@xxxxxxxxxxxxx>
Ben Turner <bturner@xxxxxxxxxx>, Gluster Users <gluster-users@xxxxxxxxxxx>

Great, can you raise a bug for the issue so that it is easier to keep track (plus you'll be notified if the patch is posted) of it? The general guidelines are @ https://gluster.readthedocs.io/en/latest/Contributors-Guide/Bug-Reporting-Guidelines but you just need to provide whatever you described in this email thread in the bug:

i.e. volume info, heal info, getfattr and stat output of the file in question.

Thanks!
Ravi


On 08/28/2017 07:49 PM, mabi wrote:
Thank you for the command. I ran it on all my nodes and now finally the the self-heal daemon does not report any files to be healed. Hopefully this scenario can get handled properly in newer versions of GlusterFS.



-------- Original Message --------
Subject: Re: [Gluster-users] self-heal not working
Local Time: August 28, 2017 10:41 AM
UTC Time: August 28, 2017 8:41 AM
From: ravishankar@xxxxxxxxxx
To: mabi <mabi@xxxxxxxxxxxxx>
Ben Turner <bturner@xxxxxxxxxx>, Gluster Users <gluster-users@xxxxxxxxxxx>




On 08/28/2017 01:29 PM, mabi wrote:
Excuse me for my naive questions but how do I reset the afr.dirty xattr on the file to be healed? and do I need to do that through a FUSE mount? or simply on every bricks directly?


Directly on the bricks: `setfattr -n trusted.afr.dirty -v 0x000000000000000000000000  /data/myvolume/brick/data/appdata_ocpom4nckwru/preview/1344699/64-64-crop.png`
-Ravi


-------- Original Message --------
Subject: Re: [Gluster-users] self-heal not working
Local Time: August 28, 2017 5:58 AM
UTC Time: August 28, 2017 3:58 AM



On 08/28/2017 01:57 AM, Ben Turner wrote:
> ----- Original Message -----
>> From: "mabi" <mabi@xxxxxxxxxxxxx>
>> To: "Ravishankar N" <ravishankar@xxxxxxxxxx>
>> Cc: "Ben Turner" <bturner@xxxxxxxxxx>, "Gluster Users" <gluster-users@xxxxxxxxxxx>
>> Sent: Sunday, August 27, 2017 3:15:33 PM
>> Subject: Re: [Gluster-users] self-heal not working
>>
>> Thanks Ravi for your analysis. So as far as I understand nothing to worry
>> about but my question now would be: how do I get rid of this file from the
>> heal info?
> Correct me if I am wrong but clearing this is just a matter of resetting the afr.dirty xattr? @Ravi - Is this correct?

Yes resetting the xattr and launching index heal or running heal-info
command should serve as a workaround.
-Ravi

>
> -b
>
>>> -------- Original Message --------
>>> Subject: Re: [Gluster-users] self-heal not working
>>> Local Time: August 27, 2017 3:45 PM
>>> UTC Time: August 27, 2017 1:45 PM
>>>
>>> Yes, the shds did pick up the file for healing (I saw messages like " got
>>> entry: 1985e233-d5ee-4e3e-a51a-cf0b5f9f2aea") but no error afterwards.
>>>
>>> Anyway I reproduced it by manually setting the afr.dirty bit for a zero
>>> byte file on all 3 bricks. Since there are no afr pending xattrs
>>> indicating good/bad copies and all files are zero bytes, the data
>>> self-heal algorithm just picks the file with the latest ctime as source.
>>> In your case that was the arbiter brick. In the code, there is a check to
>>> prevent data heals if arbiter is the source. So heal was not happening and
>>> the entries were not removed from heal-info output.
>>>
>>> Perhaps we should add a check in the code to just remove the entries from
>>> heal-info if size is zero bytes in all bricks.
>>>
>>> -Ravi
>>>
>>> On 08/25/2017 06:33 PM, mabi wrote:
>>>
>>>> Hi Ravi,
>>>>
>>>> Did you get a chance to have a look at the log files I have attached in my
>>>> last mail?
>>>>
>>>> Best,
>>>> Mabi
>>>>
>>>>> -------- Original Message --------
>>>>> Subject: Re: [Gluster-users] self-heal not working
>>>>> Local Time: August 24, 2017 12:08 PM
>>>>> UTC Time: August 24, 2017 10:08 AM
>>>>> From: mabi@xxxxxxxxxxxxx
>>>>> To: Ravishankar N
>>>>> Ben Turner [<bturner@xxxxxxxxxx>](mailto:bturner@xxxxxxxxxx), Gluster
>>>>>
>>>>> Thanks for confirming the command. I have now enabled DEBUG
>>>>> client-log-level, run a heal and then attached the glustershd log files
>>>>> of all 3 nodes in this mail.
>>>>>
>>>>> The volume concerned is called myvol-pro, the other 3 volumes have no
>>>>> problem so far.
>>>>>
>>>>> Also note that in the mean time it looks like the file has been deleted
>>>>> by the user and as such the heal info command does not show the file
>>>>> name anymore but just is GFID which is:
>>>>>
>>>>> gfid:1985e233-d5ee-4e3e-a51a-cf0b5f9f2aea
>>>>>
>>>>> Hope that helps for debugging this issue.
>>>>>
>>>>>> -------- Original Message --------
>>>>>> Subject: Re: [Gluster-users] self-heal not working
>>>>>> Local Time: August 24, 2017 5:58 AM
>>>>>> UTC Time: August 24, 2017 3:58 AM
>>>>>> Ben Turner [<bturner@xxxxxxxxxx>](mailto:bturner@xxxxxxxxxx), Gluster
>>>>>>
>>>>>> Unlikely. In your case only the afr.dirty is set, not the
>>>>>> afr.volname-client-xx xattr.
>>>>>>
>>>>>> `gluster volume set myvolume diagnostics.client-log-level DEBUG` is
>>>>>> right.
>>>>>>
>>>>>> On 08/23/2017 10:31 PM, mabi wrote:
>>>>>>
>>>>>>> I just saw the following bug which was fixed in 3.8.15:
>>>>>>>
>>>>>>>
>>>>>>> Is it possible that the problem I described in this post is related to
>>>>>>> that bug?
>>>>>>>
>>>>>>>> -------- Original Message --------
>>>>>>>> Subject: Re: [Gluster-users] self-heal not working
>>>>>>>> Local Time: August 22, 2017 11:51 AM
>>>>>>>> UTC Time: August 22, 2017 9:51 AM
>>>>>>>> From: ravishankar@xxxxxxxxxx
>>>>>>>> Ben Turner [<bturner@xxxxxxxxxx>](mailto:bturner@xxxxxxxxxx), Gluster
>>>>>>>>
>>>>>>>> On 08/22/2017 02:30 PM, mabi wrote:
>>>>>>>>
>>>>>>>>> Thanks for the additional hints, I have the following 2 questions
>>>>>>>>> first:
>>>>>>>>>
>>>>>>>>> - In order to launch the index heal is the following command correct:
>>>>>>>>> gluster volume heal myvolume
>>>>>>>> Yes
>>>>>>>>
>>>>>>>>> - If I run a "volume start force" will it have any short disruptions
>>>>>>>>> on my clients which mount the volume through FUSE? If yes, how long?
>>>>>>>>> This is a production system that"s why I am asking.
>>>>>>>> No. You can actually create a test volume on your personal linux box
>>>>>>>> to try these kinds of things without needing multiple machines. This
>>>>>>>> is how we develop and test our patches :)
>>>>>>>> "gluster volume create testvol replica 3 /home/mabi/bricks/brick{1..3}
>>>>>>>> force` and so on.
>>>>>>>>
>>>>>>>> HTH,
>>>>>>>> Ravi
>>>>>>>>
>>>>>>>>>> -------- Original Message --------
>>>>>>>>>> Subject: Re: [Gluster-users] self-heal not working
>>>>>>>>>> Local Time: August 22, 2017 6:26 AM
>>>>>>>>>> UTC Time: August 22, 2017 4:26 AM
>>>>>>>>>> From: ravishankar@xxxxxxxxxx
>>>>>>>>>> Gluster Users
>>>>>>>>>>
>>>>>>>>>> Explore the following:
>>>>>>>>>>
>>>>>>>>>> - Launch index heal and look at the glustershd logs of all bricks
>>>>>>>>>> for possible errors
>>>>>>>>>>
>>>>>>>>>> - See if the glustershd in each node is connected to all bricks.
>>>>>>>>>>
>>>>>>>>>> - If not try to restart shd by `volume start force`
>>>>>>>>>>
>>>>>>>>>> - Launch index heal again and try.
>>>>>>>>>>
>>>>>>>>>> - Try debugging the shd log by setting client-log-level to DEBUG
>>>>>>>>>> temporarily.
>>>>>>>>>>
>>>>>>>>>> On 08/22/2017 03:19 AM, mabi wrote:
>>>>>>>>>>
>>>>>>>>>>> Sure, it doesn"t look like a split brain based on the output:
>>>>>>>>>>>
>>>>>>>>>>> Brick node1.domain.tld:/data/myvolume/brick
>>>>>>>>>>> Status: Connected
>>>>>>>>>>> Number of entries in split-brain: 0
>>>>>>>>>>>
>>>>>>>>>>> Brick node2.domain.tld:/data/myvolume/brick
>>>>>>>>>>> Status: Connected
>>>>>>>>>>> Number of entries in split-brain: 0
>>>>>>>>>>>
>>>>>>>>>>> Brick node3.domain.tld:/srv/glusterfs/myvolume/brick
>>>>>>>>>>> Status: Connected
>>>>>>>>>>> Number of entries in split-brain: 0
>>>>>>>>>>>
>>>>>>>>>>>> -------- Original Message --------
>>>>>>>>>>>> Subject: Re: [Gluster-users] self-heal not working
>>>>>>>>>>>> Local Time: August 21, 2017 11:35 PM
>>>>>>>>>>>> UTC Time: August 21, 2017 9:35 PM
>>>>>>>>>>>> From: bturner@xxxxxxxxxx
>>>>>>>>>>>> Gluster Users
>>>>>>>>>>>>
>>>>>>>>>>>> Can you also provide:
>>>>>>>>>>>>
>>>>>>>>>>>> gluster v heal <my vol> info split-brain
>>>>>>>>>>>>
>>>>>>>>>>>> If it is split brain just delete the incorrect file from the brick
>>>>>>>>>>>> and run heal again. I haven"t tried this with arbiter but I
>>>>>>>>>>>> assume the process is the same.
>>>>>>>>>>>>
>>>>>>>>>>>> -b
>>>>>>>>>>>>
>>>>>>>>>>>> ----- Original Message -----
>>>>>>>>>>>>> From: "mabi" [<mabi@xxxxxxxxxxxxx>](mailto:mabi@xxxxxxxxxxxxx)
>>>>>>>>>>>>> To: "Ben Turner"
>>>>>>>>>>>>> Cc: "Gluster Users"
>>>>>>>>>>>>> Sent: Monday, August 21, 2017 4:55:59 PM
>>>>>>>>>>>>> Subject: Re: [Gluster-users] self-heal not working
>>>>>>>>>>>>>
>>>>>>>>>>>>> Hi Ben,
>>>>>>>>>>>>>
>>>>>>>>>>>>> So it is really a 0 kBytes file everywhere (all nodes including
>>>>>>>>>>>>> the arbiter
>>>>>>>>>>>>> and from the client).
>>>>>>>>>>>>> Here below you will find the output you requested. Hopefully that
>>>>>>>>>>>>> will help
>>>>>>>>>>>>> to find out why this specific file is not healing... Let me know
>>>>>>>>>>>>> if you need
>>>>>>>>>>>>> any more information. Btw node3 is my arbiter node.
>>>>>>>>>>>>>
>>>>>>>>>>>>> NODE1:
>>>>>>>>>>>>>
>>>>>>>>>>>>> STAT:
>>>>>>>>>>>>> File:
>>>>>>>>>>>>> ‘/data/myvolume/brick/data/appdata_ocpom4nckwru/preview/1344699/64-64-crop.png’
>>>>>>>>>>>>> Size: 0 Blocks: 38 IO Block: 131072 regular empty file
>>>>>>>>>>>>> Device: 24h/36d Inode: 10033884 Links: 2
>>>>>>>>>>>>> Access: (0644/-rw-r--r--) Uid: ( 33/www-data) Gid: ( 33/www-data)
>>>>>>>>>>>>> Access: 2017-08-14 17:04:55.530681000 +0200
>>>>>>>>>>>>> Modify: 2017-08-14 17:11:46.407404779 +0200
>>>>>>>>>>>>> Change: 2017-08-14 17:11:46.407404779 +0200
>>>>>>>>>>>>> Birth: -
>>>>>>>>>>>>>
>>>>>>>>>>>>> GETFATTR:
>>>>>>>>>>>>> trusted.afr.dirty=0sAAAAAQAAAAAAAAAA
>>>>>>>>>>>>> trusted.bit-rot.version=0sAgAAAAAAAABZhuknAAlJAg==
>>>>>>>>>>>>> trusted.gfid=0sGYXiM9XuTj6lGs8LX58q6g==
>>>>>>>>>>>>> trusted.glusterfs.d99af2fa-439b-4a21-bf3a-38f3849f87ec.xtime=0sWZG9sgAGOyo=
>>>>>>>>>>>>>
>>>>>>>>>>>>> NODE2:
>>>>>>>>>>>>>
>>>>>>>>>>>>> STAT:
>>>>>>>>>>>>> File:
>>>>>>>>>>>>> ‘/data/myvolume/brick/data/appdata_ocpom4nckwru/preview/1344699/64-64-crop.png’
>>>>>>>>>>>>> Size: 0 Blocks: 38 IO Block: 131072 regular empty file
>>>>>>>>>>>>> Device: 26h/38d Inode: 10031330 Links: 2
>>>>>>>>>>>>> Access: (0644/-rw-r--r--) Uid: ( 33/www-data) Gid: ( 33/www-data)
>>>>>>>>>>>>> Access: 2017-08-14 17:04:55.530681000 +0200
>>>>>>>>>>>>> Modify: 2017-08-14 17:11:46.403704181 +0200
>>>>>>>>>>>>> Change: 2017-08-14 17:11:46.403704181 +0200
>>>>>>>>>>>>> Birth: -
>>>>>>>>>>>>>
>>>>>>>>>>>>> GETFATTR:
>>>>>>>>>>>>> trusted.afr.dirty=0sAAAAAQAAAAAAAAAA
>>>>>>>>>>>>> trusted.bit-rot.version=0sAgAAAAAAAABZhu6wAA8Hpw==
>>>>>>>>>>>>> trusted.gfid=0sGYXiM9XuTj6lGs8LX58q6g==
>>>>>>>>>>>>> trusted.glusterfs.d99af2fa-439b-4a21-bf3a-38f3849f87ec.xtime=0sWZG9sgAGOVE=
>>>>>>>>>>>>>
>>>>>>>>>>>>> NODE3:
>>>>>>>>>>>>> STAT:
>>>>>>>>>>>>> File:
>>>>>>>>>>>>> /srv/glusterfs/myvolume/brick/data/appdata_ocpom4nckwru/preview/1344699/64-64-crop.png
>>>>>>>>>>>>> Size: 0 Blocks: 0 IO Block: 4096 regular empty file
>>>>>>>>>>>>> Device: ca11h/51729d Inode: 405208959 Links: 2
>>>>>>>>>>>>> Access: (0644/-rw-r--r--) Uid: ( 33/www-data) Gid: ( 33/www-data)
>>>>>>>>>>>>> Access: 2017-08-14 17:04:55.530681000 +0200
>>>>>>>>>>>>> Modify: 2017-08-14 17:04:55.530681000 +0200
>>>>>>>>>>>>> Change: 2017-08-14 17:11:46.604380051 +0200
>>>>>>>>>>>>> Birth: -
>>>>>>>>>>>>>
>>>>>>>>>>>>> GETFATTR:
>>>>>>>>>>>>> trusted.afr.dirty=0sAAAAAQAAAAAAAAAA
>>>>>>>>>>>>> trusted.bit-rot.version=0sAgAAAAAAAABZe6ejAAKPAg==
>>>>>>>>>>>>> trusted.gfid=0sGYXiM9XuTj6lGs8LX58q6g==
>>>>>>>>>>>>> trusted.glusterfs.d99af2fa-439b-4a21-bf3a-38f3849f87ec.xtime=0sWZG9sgAGOc4=
>>>>>>>>>>>>>
>>>>>>>>>>>>> CLIENT GLUSTER MOUNT:
>>>>>>>>>>>>> STAT:
>>>>>>>>>>>>> File:
>>>>>>>>>>>>> "/mnt/myvolume/data/appdata_ocpom4nckwru/preview/1344699/64-64-crop.png"
>>>>>>>>>>>>> Size: 0 Blocks: 0 IO Block: 131072 regular empty file
>>>>>>>>>>>>> Device: 1eh/30d Inode: 11897049013408443114 Links: 1
>>>>>>>>>>>>> Access: (0644/-rw-r--r--) Uid: ( 33/www-data) Gid: ( 33/www-data)
>>>>>>>>>>>>> Access: 2017-08-14 17:04:55.530681000 +0200
>>>>>>>>>>>>> Modify: 2017-08-14 17:11:46.407404779 +0200
>>>>>>>>>>>>> Change: 2017-08-14 17:11:46.407404779 +0200
>>>>>>>>>>>>> Birth: -
>>>>>>>>>>>>>
>>>>>>>>>>>>>> -------- Original Message --------
>>>>>>>>>>>>>> Subject: Re: [Gluster-users] self-heal not working
>>>>>>>>>>>>>> Local Time: August 21, 2017 9:34 PM
>>>>>>>>>>>>>> UTC Time: August 21, 2017 7:34 PM
>>>>>>>>>>>>>> From: bturner@xxxxxxxxxx
>>>>>>>>>>>>>> Gluster Users
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> ----- Original Message -----
>>>>>>>>>>>>>>> From: "mabi" [<mabi@xxxxxxxxxxxxx>](mailto:mabi@xxxxxxxxxxxxx)
>>>>>>>>>>>>>>> To: "Gluster Users"
>>>>>>>>>>>>>>> Sent: Monday, August 21, 2017 9:28:24 AM
>>>>>>>>>>>>>>> Subject: [Gluster-users] self-heal not working
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I have a replicat 2 with arbiter GlusterFS 3.8.11 cluster and
>>>>>>>>>>>>>>> there is
>>>>>>>>>>>>>>> currently one file listed to be healed as you can see below
>>>>>>>>>>>>>>> but never gets
>>>>>>>>>>>>>>> healed by the self-heal daemon:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Brick node1.domain.tld:/data/myvolume/brick
>>>>>>>>>>>>>>> /data/appdata_ocpom4nckwru/preview/1344699/64-64-crop.png
>>>>>>>>>>>>>>> Status: Connected
>>>>>>>>>>>>>>> Number of entries: 1
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Brick node2.domain.tld:/data/myvolume/brick
>>>>>>>>>>>>>>> /data/appdata_ocpom4nckwru/preview/1344699/64-64-crop.png
>>>>>>>>>>>>>>> Status: Connected
>>>>>>>>>>>>>>> Number of entries: 1
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Brick node3.domain.tld:/srv/glusterfs/myvolume/brick
>>>>>>>>>>>>>>> /data/appdata_ocpom4nckwru/preview/1344699/64-64-crop.png
>>>>>>>>>>>>>>> Status: Connected
>>>>>>>>>>>>>>> Number of entries: 1
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> As once recommended on this mailing list I have mounted that
>>>>>>>>>>>>>>> glusterfs
>>>>>>>>>>>>>>> volume
>>>>>>>>>>>>>>> temporarily through fuse/glusterfs and ran a "stat" on that
>>>>>>>>>>>>>>> file which is
>>>>>>>>>>>>>>> listed above but nothing happened.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> The file itself is available on all 3 nodes/bricks but on the
>>>>>>>>>>>>>>> last node it
>>>>>>>>>>>>>>> has a different date. By the way this file is 0 kBytes big. Is
>>>>>>>>>>>>>>> that maybe
>>>>>>>>>>>>>>> the reason why the self-heal does not work?
>>>>>>>>>>>>>> Is the file actually 0 bytes or is it just 0 bytes on the
>>>>>>>>>>>>>> arbiter(0 bytes
>>>>>>>>>>>>>> are expected on the arbiter, it just stores metadata)? Can you
>>>>>>>>>>>>>> send us the
>>>>>>>>>>>>>> output from stat on all 3 nodes:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> $ stat <file on back end brick>
>>>>>>>>>>>>>> $ getfattr -d -m - <file on back end brick>
>>>>>>>>>>>>>> $ stat <file from gluster mount>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Lets see what things look like on the back end, it should tell
>>>>>>>>>>>>>> us why
>>>>>>>>>>>>>> healing is failing.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> -b
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> And how can I now make this file to heal?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>> Mabi
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>>> Gluster-users mailing list
>>>>>>>>>>> _______________________________________________
>>>>>>>>>>> Gluster-users mailing list
>>>>>>>>>>>



_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://lists.gluster.org/mailman/listinfo/gluster-users

[Index of Archives]     [Gluster Development]     [Linux Filesytems Development]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux