Re: self-heal not working

Ben Turner <bturner@xxxxxxxxxx> · Sun, 27 Aug 2017 16:27:38 -0400 (EDT)

----- Original Message -----
> From: "mabi" <mabi@xxxxxxxxxxxxx>
> To: "Ravishankar N" <ravishankar@xxxxxxxxxx>
> Cc: "Ben Turner" <bturner@xxxxxxxxxx>, "Gluster Users" <gluster-users@xxxxxxxxxxx>
> Sent: Sunday, August 27, 2017 3:15:33 PM
> Subject: Re:  self-heal not working
> 
> Thanks Ravi for your analysis. So as far as I understand nothing to worry
> about but my question now would be: how do I get rid of this file from the
> heal info?

Correct me if I am wrong but clearing this is just a matter of resetting the afr.dirty xattr?  @Ravi - Is this correct?

-b

> 
> > -------- Original Message --------
> > Subject: Re:  self-heal not working
> > Local Time: August 27, 2017 3:45 PM
> > UTC Time: August 27, 2017 1:45 PM
> > From: ravishankar@xxxxxxxxxx
> > To: mabi <mabi@xxxxxxxxxxxxx>
> > Ben Turner <bturner@xxxxxxxxxx>, Gluster Users <gluster-users@xxxxxxxxxxx>
> >
> > Yes, the shds did pick up the file for healing (I saw messages like " got
> > entry: 1985e233-d5ee-4e3e-a51a-cf0b5f9f2aea") but no error afterwards.
> >
> > Anyway I reproduced it by manually setting the afr.dirty bit for a zero
> > byte file on all 3 bricks. Since there are no afr pending xattrs
> > indicating good/bad copies and all files are zero bytes, the data
> > self-heal algorithm just picks the file with the latest ctime as source.
> > In your case that was the arbiter brick. In the code, there is a check to
> > prevent data heals if arbiter is the source. So heal was not happening and
> > the entries were not removed from heal-info output.
> >
> > Perhaps we should add a check in the code to just remove the entries from
> > heal-info if size is zero bytes in all bricks.
> >
> > -Ravi
> >
> > On 08/25/2017 06:33 PM, mabi wrote:
> >
> >> Hi Ravi,
> >>
> >> Did you get a chance to have a look at the log files I have attached in my
> >> last mail?
> >>
> >> Best,
> >> Mabi
> >>
> >>> -------- Original Message --------
> >>> Subject: Re:  self-heal not working
> >>> Local Time: August 24, 2017 12:08 PM
> >>> UTC Time: August 24, 2017 10:08 AM
> >>> From: mabi@xxxxxxxxxxxxx
> >>> To: Ravishankar N
> >>> [<ravishankar@xxxxxxxxxx>](mailto:ravishankar@xxxxxxxxxx)
> >>> Ben Turner [<bturner@xxxxxxxxxx>](mailto:bturner@xxxxxxxxxx), Gluster
> >>> Users [<gluster-users@xxxxxxxxxxx>](mailto:gluster-users@xxxxxxxxxxx)
> >>>
> >>> Thanks for confirming the command. I have now enabled DEBUG
> >>> client-log-level, run a heal and then attached the glustershd log files
> >>> of all 3 nodes in this mail.
> >>>
> >>> The volume concerned is called myvol-pro, the other 3 volumes have no
> >>> problem so far.
> >>>
> >>> Also note that in the mean time it looks like the file has been deleted
> >>> by the user and as such the heal info command does not show the file
> >>> name anymore but just is GFID which is:
> >>>
> >>> gfid:1985e233-d5ee-4e3e-a51a-cf0b5f9f2aea
> >>>
> >>> Hope that helps for debugging this issue.
> >>>
> >>>> -------- Original Message --------
> >>>> Subject: Re:  self-heal not working
> >>>> Local Time: August 24, 2017 5:58 AM
> >>>> UTC Time: August 24, 2017 3:58 AM
> >>>> From: ravishankar@xxxxxxxxxx
> >>>> To: mabi [<mabi@xxxxxxxxxxxxx>](mailto:mabi@xxxxxxxxxxxxx)
> >>>> Ben Turner [<bturner@xxxxxxxxxx>](mailto:bturner@xxxxxxxxxx), Gluster
> >>>> Users [<gluster-users@xxxxxxxxxxx>](mailto:gluster-users@xxxxxxxxxxx)
> >>>>
> >>>> Unlikely. In your case only the afr.dirty is set, not the
> >>>> afr.volname-client-xx xattr.
> >>>>
> >>>> `gluster volume set myvolume diagnostics.client-log-level DEBUG` is
> >>>> right.
> >>>>
> >>>> On 08/23/2017 10:31 PM, mabi wrote:
> >>>>
> >>>>> I just saw the following bug which was fixed in 3.8.15:
> >>>>>
> >>>>> https://bugzilla.redhat.com/show_bug.cgi?id=1471613
> >>>>>
> >>>>> Is it possible that the problem I described in this post is related to
> >>>>> that bug?
> >>>>>
> >>>>>> -------- Original Message --------
> >>>>>> Subject: Re:  self-heal not working
> >>>>>> Local Time: August 22, 2017 11:51 AM
> >>>>>> UTC Time: August 22, 2017 9:51 AM
> >>>>>> From: ravishankar@xxxxxxxxxx
> >>>>>> To: mabi [<mabi@xxxxxxxxxxxxx>](mailto:mabi@xxxxxxxxxxxxx)
> >>>>>> Ben Turner [<bturner@xxxxxxxxxx>](mailto:bturner@xxxxxxxxxx), Gluster
> >>>>>> Users [<gluster-users@xxxxxxxxxxx>](mailto:gluster-users@xxxxxxxxxxx)
> >>>>>>
> >>>>>> On 08/22/2017 02:30 PM, mabi wrote:
> >>>>>>
> >>>>>>> Thanks for the additional hints, I have the following 2 questions
> >>>>>>> first:
> >>>>>>>
> >>>>>>> - In order to launch the index heal is the following command correct:
> >>>>>>> gluster volume heal myvolume
> >>>>>>
> >>>>>> Yes
> >>>>>>
> >>>>>>> - If I run a "volume start force" will it have any short disruptions
> >>>>>>> on my clients which mount the volume through FUSE? If yes, how long?
> >>>>>>> This is a production system that's why I am asking.
> >>>>>>
> >>>>>> No. You can actually create a test volume on  your personal linux box
> >>>>>> to try these kinds of things without needing multiple machines. This
> >>>>>> is how we develop and test our patches :)
> >>>>>> 'gluster volume create testvol replica 3 /home/mabi/bricks/brick{1..3}
> >>>>>> force` and so on.
> >>>>>>
> >>>>>> HTH,
> >>>>>> Ravi
> >>>>>>
> >>>>>>>> -------- Original Message --------
> >>>>>>>> Subject: Re:  self-heal not working
> >>>>>>>> Local Time: August 22, 2017 6:26 AM
> >>>>>>>> UTC Time: August 22, 2017 4:26 AM
> >>>>>>>> From: ravishankar@xxxxxxxxxx
> >>>>>>>> To: mabi [<mabi@xxxxxxxxxxxxx>](mailto:mabi@xxxxxxxxxxxxx), Ben
> >>>>>>>> Turner [<bturner@xxxxxxxxxx>](mailto:bturner@xxxxxxxxxx)
> >>>>>>>> Gluster Users
> >>>>>>>> [<gluster-users@xxxxxxxxxxx>](mailto:gluster-users@xxxxxxxxxxx)
> >>>>>>>>
> >>>>>>>> Explore the following:
> >>>>>>>>
> >>>>>>>> - Launch index heal and look at the glustershd logs of all bricks
> >>>>>>>> for possible errors
> >>>>>>>>
> >>>>>>>> - See if the glustershd in each node is connected to all bricks.
> >>>>>>>>
> >>>>>>>> - If not try to restart shd by `volume start force`
> >>>>>>>>
> >>>>>>>> - Launch index heal again and try.
> >>>>>>>>
> >>>>>>>> - Try debugging the shd log by setting client-log-level to DEBUG
> >>>>>>>> temporarily.
> >>>>>>>>
> >>>>>>>> On 08/22/2017 03:19 AM, mabi wrote:
> >>>>>>>>
> >>>>>>>>> Sure, it doesn't look like a split brain based on the output:
> >>>>>>>>>
> >>>>>>>>> Brick node1.domain.tld:/data/myvolume/brick
> >>>>>>>>> Status: Connected
> >>>>>>>>> Number of entries in split-brain: 0
> >>>>>>>>>
> >>>>>>>>> Brick node2.domain.tld:/data/myvolume/brick
> >>>>>>>>> Status: Connected
> >>>>>>>>> Number of entries in split-brain: 0
> >>>>>>>>>
> >>>>>>>>> Brick node3.domain.tld:/srv/glusterfs/myvolume/brick
> >>>>>>>>> Status: Connected
> >>>>>>>>> Number of entries in split-brain: 0
> >>>>>>>>>
> >>>>>>>>>> -------- Original Message --------
> >>>>>>>>>> Subject: Re:  self-heal not working
> >>>>>>>>>> Local Time: August 21, 2017 11:35 PM
> >>>>>>>>>> UTC Time: August 21, 2017 9:35 PM
> >>>>>>>>>> From: bturner@xxxxxxxxxx
> >>>>>>>>>> To: mabi [<mabi@xxxxxxxxxxxxx>](mailto:mabi@xxxxxxxxxxxxx)
> >>>>>>>>>> Gluster Users
> >>>>>>>>>> [<gluster-users@xxxxxxxxxxx>](mailto:gluster-users@xxxxxxxxxxx)
> >>>>>>>>>>
> >>>>>>>>>> Can you also provide:
> >>>>>>>>>>
> >>>>>>>>>> gluster v heal <my vol> info split-brain
> >>>>>>>>>>
> >>>>>>>>>> If it is split brain just delete the incorrect file from the brick
> >>>>>>>>>> and run heal again. I haven"t tried this with arbiter but I
> >>>>>>>>>> assume the process is the same.
> >>>>>>>>>>
> >>>>>>>>>> -b
> >>>>>>>>>>
> >>>>>>>>>> ----- Original Message -----
> >>>>>>>>>>> From: "mabi" [<mabi@xxxxxxxxxxxxx>](mailto:mabi@xxxxxxxxxxxxx)
> >>>>>>>>>>> To: "Ben Turner"
> >>>>>>>>>>> [<bturner@xxxxxxxxxx>](mailto:bturner@xxxxxxxxxx)
> >>>>>>>>>>> Cc: "Gluster Users"
> >>>>>>>>>>> [<gluster-users@xxxxxxxxxxx>](mailto:gluster-users@xxxxxxxxxxx)
> >>>>>>>>>>> Sent: Monday, August 21, 2017 4:55:59 PM
> >>>>>>>>>>> Subject: Re:  self-heal not working
> >>>>>>>>>>>
> >>>>>>>>>>> Hi Ben,
> >>>>>>>>>>>
> >>>>>>>>>>> So it is really a 0 kBytes file everywhere (all nodes including
> >>>>>>>>>>> the arbiter
> >>>>>>>>>>> and from the client).
> >>>>>>>>>>> Here below you will find the output you requested. Hopefully that
> >>>>>>>>>>> will help
> >>>>>>>>>>> to find out why this specific file is not healing... Let me know
> >>>>>>>>>>> if you need
> >>>>>>>>>>> any more information. Btw node3 is my arbiter node.
> >>>>>>>>>>>
> >>>>>>>>>>> NODE1:
> >>>>>>>>>>>
> >>>>>>>>>>> STAT:
> >>>>>>>>>>> File:
> >>>>>>>>>>> ‘/data/myvolume/brick/data/appdata_ocpom4nckwru/preview/1344699/64-64-crop.png’
> >>>>>>>>>>> Size: 0 Blocks: 38 IO Block: 131072 regular empty file
> >>>>>>>>>>> Device: 24h/36d Inode: 10033884 Links: 2
> >>>>>>>>>>> Access: (0644/-rw-r--r--) Uid: ( 33/www-data) Gid: ( 33/www-data)
> >>>>>>>>>>> Access: 2017-08-14 17:04:55.530681000 +0200
> >>>>>>>>>>> Modify: 2017-08-14 17:11:46.407404779 +0200
> >>>>>>>>>>> Change: 2017-08-14 17:11:46.407404779 +0200
> >>>>>>>>>>> Birth: -
> >>>>>>>>>>>
> >>>>>>>>>>> GETFATTR:
> >>>>>>>>>>> trusted.afr.dirty=0sAAAAAQAAAAAAAAAA
> >>>>>>>>>>> trusted.bit-rot.version=0sAgAAAAAAAABZhuknAAlJAg==
> >>>>>>>>>>> trusted.gfid=0sGYXiM9XuTj6lGs8LX58q6g==
> >>>>>>>>>>> trusted.glusterfs.d99af2fa-439b-4a21-bf3a-38f3849f87ec.xtime=0sWZG9sgAGOyo=
> >>>>>>>>>>>
> >>>>>>>>>>> NODE2:
> >>>>>>>>>>>
> >>>>>>>>>>> STAT:
> >>>>>>>>>>> File:
> >>>>>>>>>>> ‘/data/myvolume/brick/data/appdata_ocpom4nckwru/preview/1344699/64-64-crop.png’
> >>>>>>>>>>> Size: 0 Blocks: 38 IO Block: 131072 regular empty file
> >>>>>>>>>>> Device: 26h/38d Inode: 10031330 Links: 2
> >>>>>>>>>>> Access: (0644/-rw-r--r--) Uid: ( 33/www-data) Gid: ( 33/www-data)
> >>>>>>>>>>> Access: 2017-08-14 17:04:55.530681000 +0200
> >>>>>>>>>>> Modify: 2017-08-14 17:11:46.403704181 +0200
> >>>>>>>>>>> Change: 2017-08-14 17:11:46.403704181 +0200
> >>>>>>>>>>> Birth: -
> >>>>>>>>>>>
> >>>>>>>>>>> GETFATTR:
> >>>>>>>>>>> trusted.afr.dirty=0sAAAAAQAAAAAAAAAA
> >>>>>>>>>>> trusted.bit-rot.version=0sAgAAAAAAAABZhu6wAA8Hpw==
> >>>>>>>>>>> trusted.gfid=0sGYXiM9XuTj6lGs8LX58q6g==
> >>>>>>>>>>> trusted.glusterfs.d99af2fa-439b-4a21-bf3a-38f3849f87ec.xtime=0sWZG9sgAGOVE=
> >>>>>>>>>>>
> >>>>>>>>>>> NODE3:
> >>>>>>>>>>> STAT:
> >>>>>>>>>>> File:
> >>>>>>>>>>> /srv/glusterfs/myvolume/brick/data/appdata_ocpom4nckwru/preview/1344699/64-64-crop.png
> >>>>>>>>>>> Size: 0 Blocks: 0 IO Block: 4096 regular empty file
> >>>>>>>>>>> Device: ca11h/51729d Inode: 405208959 Links: 2
> >>>>>>>>>>> Access: (0644/-rw-r--r--) Uid: ( 33/www-data) Gid: ( 33/www-data)
> >>>>>>>>>>> Access: 2017-08-14 17:04:55.530681000 +0200
> >>>>>>>>>>> Modify: 2017-08-14 17:04:55.530681000 +0200
> >>>>>>>>>>> Change: 2017-08-14 17:11:46.604380051 +0200
> >>>>>>>>>>> Birth: -
> >>>>>>>>>>>
> >>>>>>>>>>> GETFATTR:
> >>>>>>>>>>> trusted.afr.dirty=0sAAAAAQAAAAAAAAAA
> >>>>>>>>>>> trusted.bit-rot.version=0sAgAAAAAAAABZe6ejAAKPAg==
> >>>>>>>>>>> trusted.gfid=0sGYXiM9XuTj6lGs8LX58q6g==
> >>>>>>>>>>> trusted.glusterfs.d99af2fa-439b-4a21-bf3a-38f3849f87ec.xtime=0sWZG9sgAGOc4=
> >>>>>>>>>>>
> >>>>>>>>>>> CLIENT GLUSTER MOUNT:
> >>>>>>>>>>> STAT:
> >>>>>>>>>>> File:
> >>>>>>>>>>> "/mnt/myvolume/data/appdata_ocpom4nckwru/preview/1344699/64-64-crop.png"
> >>>>>>>>>>> Size: 0 Blocks: 0 IO Block: 131072 regular empty file
> >>>>>>>>>>> Device: 1eh/30d Inode: 11897049013408443114 Links: 1
> >>>>>>>>>>> Access: (0644/-rw-r--r--) Uid: ( 33/www-data) Gid: ( 33/www-data)
> >>>>>>>>>>> Access: 2017-08-14 17:04:55.530681000 +0200
> >>>>>>>>>>> Modify: 2017-08-14 17:11:46.407404779 +0200
> >>>>>>>>>>> Change: 2017-08-14 17:11:46.407404779 +0200
> >>>>>>>>>>> Birth: -
> >>>>>>>>>>>
> >>>>>>>>>>> > -------- Original Message --------
> >>>>>>>>>>> > Subject: Re:  self-heal not working
> >>>>>>>>>>> > Local Time: August 21, 2017 9:34 PM
> >>>>>>>>>>> > UTC Time: August 21, 2017 7:34 PM
> >>>>>>>>>>> > From: bturner@xxxxxxxxxx
> >>>>>>>>>>> > To: mabi [<mabi@xxxxxxxxxxxxx>](mailto:mabi@xxxxxxxxxxxxx)
> >>>>>>>>>>> > Gluster Users
> >>>>>>>>>>> > [<gluster-users@xxxxxxxxxxx>](mailto:gluster-users@xxxxxxxxxxx)
> >>>>>>>>>>> >
> >>>>>>>>>>> > ----- Original Message -----
> >>>>>>>>>>> >> From: "mabi" [<mabi@xxxxxxxxxxxxx>](mailto:mabi@xxxxxxxxxxxxx)
> >>>>>>>>>>> >> To: "Gluster Users"
> >>>>>>>>>>> >> [<gluster-users@xxxxxxxxxxx>](mailto:gluster-users@xxxxxxxxxxx)
> >>>>>>>>>>> >> Sent: Monday, August 21, 2017 9:28:24 AM
> >>>>>>>>>>> >> Subject:  self-heal not working
> >>>>>>>>>>> >>
> >>>>>>>>>>> >> Hi,
> >>>>>>>>>>> >>
> >>>>>>>>>>> >> I have a replicat 2 with arbiter GlusterFS 3.8.11 cluster and
> >>>>>>>>>>> >> there is
> >>>>>>>>>>> >> currently one file listed to be healed as you can see below
> >>>>>>>>>>> >> but never gets
> >>>>>>>>>>> >> healed by the self-heal daemon:
> >>>>>>>>>>> >>
> >>>>>>>>>>> >> Brick node1.domain.tld:/data/myvolume/brick
> >>>>>>>>>>> >> /data/appdata_ocpom4nckwru/preview/1344699/64-64-crop.png
> >>>>>>>>>>> >> Status: Connected
> >>>>>>>>>>> >> Number of entries: 1
> >>>>>>>>>>> >>
> >>>>>>>>>>> >> Brick node2.domain.tld:/data/myvolume/brick
> >>>>>>>>>>> >> /data/appdata_ocpom4nckwru/preview/1344699/64-64-crop.png
> >>>>>>>>>>> >> Status: Connected
> >>>>>>>>>>> >> Number of entries: 1
> >>>>>>>>>>> >>
> >>>>>>>>>>> >> Brick node3.domain.tld:/srv/glusterfs/myvolume/brick
> >>>>>>>>>>> >> /data/appdata_ocpom4nckwru/preview/1344699/64-64-crop.png
> >>>>>>>>>>> >> Status: Connected
> >>>>>>>>>>> >> Number of entries: 1
> >>>>>>>>>>> >>
> >>>>>>>>>>> >> As once recommended on this mailing list I have mounted that
> >>>>>>>>>>> >> glusterfs
> >>>>>>>>>>> >> volume
> >>>>>>>>>>> >> temporarily through fuse/glusterfs and ran a "stat" on that
> >>>>>>>>>>> >> file which is
> >>>>>>>>>>> >> listed above but nothing happened.
> >>>>>>>>>>> >>
> >>>>>>>>>>> >> The file itself is available on all 3 nodes/bricks but on the
> >>>>>>>>>>> >> last node it
> >>>>>>>>>>> >> has a different date. By the way this file is 0 kBytes big. Is
> >>>>>>>>>>> >> that maybe
> >>>>>>>>>>> >> the reason why the self-heal does not work?
> >>>>>>>>>>> >
> >>>>>>>>>>> > Is the file actually 0 bytes or is it just 0 bytes on the
> >>>>>>>>>>> > arbiter(0 bytes
> >>>>>>>>>>> > are expected on the arbiter, it just stores metadata)? Can you
> >>>>>>>>>>> > send us the
> >>>>>>>>>>> > output from stat on all 3 nodes:
> >>>>>>>>>>> >
> >>>>>>>>>>> > $ stat <file on back end brick>
> >>>>>>>>>>> > $ getfattr -d -m - <file on back end brick>
> >>>>>>>>>>> > $ stat <file from gluster mount>
> >>>>>>>>>>> >
> >>>>>>>>>>> > Lets see what things look like on the back end, it should tell
> >>>>>>>>>>> > us why
> >>>>>>>>>>> > healing is failing.
> >>>>>>>>>>> >
> >>>>>>>>>>> > -b
> >>>>>>>>>>> >
> >>>>>>>>>>> >>
> >>>>>>>>>>> >> And how can I now make this file to heal?
> >>>>>>>>>>> >>
> >>>>>>>>>>> >> Thanks,
> >>>>>>>>>>> >> Mabi
> >>>>>>>>>>> >>
> >>>>>>>>>>> >>
> >>>>>>>>>>> >>
> >>>>>>>>>>> >>
> >>>>>>>>>>> >> _______________________________________________
> >>>>>>>>>>> >> Gluster-users mailing list
> >>>>>>>>>>> >> Gluster-users@xxxxxxxxxxx
> >>>>>>>>>>> >> http://lists.gluster.org/mailman/listinfo/gluster-users
> >>>>>>>>>
> >>>>>>>>> _______________________________________________
> >>>>>>>>> Gluster-users mailing list
> >>>>>>>>> Gluster-users@xxxxxxxxxxx
> >>>>>>>>>
> >>>>>>>>> http://lists.gluster.org/mailman/listinfo/gluster-users
_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://lists.gluster.org/mailman/listinfo/gluster-users