----- Original Message ----- > From: "mabi" <mabi@xxxxxxxxxxxxx> > To: "Ravishankar N" <ravishankar@xxxxxxxxxx> > Cc: "Ben Turner" <bturner@xxxxxxxxxx>, "Gluster Users" <gluster-users@xxxxxxxxxxx> > Sent: Sunday, August 27, 2017 3:15:33 PM > Subject: Re: self-heal not working > > Thanks Ravi for your analysis. So as far as I understand nothing to worry > about but my question now would be: how do I get rid of this file from the > heal info? Correct me if I am wrong but clearing this is just a matter of resetting the afr.dirty xattr? @Ravi - Is this correct? -b > > > -------- Original Message -------- > > Subject: Re: self-heal not working > > Local Time: August 27, 2017 3:45 PM > > UTC Time: August 27, 2017 1:45 PM > > From: ravishankar@xxxxxxxxxx > > To: mabi <mabi@xxxxxxxxxxxxx> > > Ben Turner <bturner@xxxxxxxxxx>, Gluster Users <gluster-users@xxxxxxxxxxx> > > > > Yes, the shds did pick up the file for healing (I saw messages like " got > > entry: 1985e233-d5ee-4e3e-a51a-cf0b5f9f2aea") but no error afterwards. > > > > Anyway I reproduced it by manually setting the afr.dirty bit for a zero > > byte file on all 3 bricks. Since there are no afr pending xattrs > > indicating good/bad copies and all files are zero bytes, the data > > self-heal algorithm just picks the file with the latest ctime as source. > > In your case that was the arbiter brick. In the code, there is a check to > > prevent data heals if arbiter is the source. So heal was not happening and > > the entries were not removed from heal-info output. > > > > Perhaps we should add a check in the code to just remove the entries from > > heal-info if size is zero bytes in all bricks. > > > > -Ravi > > > > On 08/25/2017 06:33 PM, mabi wrote: > > > >> Hi Ravi, > >> > >> Did you get a chance to have a look at the log files I have attached in my > >> last mail? > >> > >> Best, > >> Mabi > >> > >>> -------- Original Message -------- > >>> Subject: Re: self-heal not working > >>> Local Time: August 24, 2017 12:08 PM > >>> UTC Time: August 24, 2017 10:08 AM > >>> From: mabi@xxxxxxxxxxxxx > >>> To: Ravishankar N > >>> [<ravishankar@xxxxxxxxxx>](mailto:ravishankar@xxxxxxxxxx) > >>> Ben Turner [<bturner@xxxxxxxxxx>](mailto:bturner@xxxxxxxxxx), Gluster > >>> Users [<gluster-users@xxxxxxxxxxx>](mailto:gluster-users@xxxxxxxxxxx) > >>> > >>> Thanks for confirming the command. I have now enabled DEBUG > >>> client-log-level, run a heal and then attached the glustershd log files > >>> of all 3 nodes in this mail. > >>> > >>> The volume concerned is called myvol-pro, the other 3 volumes have no > >>> problem so far. > >>> > >>> Also note that in the mean time it looks like the file has been deleted > >>> by the user and as such the heal info command does not show the file > >>> name anymore but just is GFID which is: > >>> > >>> gfid:1985e233-d5ee-4e3e-a51a-cf0b5f9f2aea > >>> > >>> Hope that helps for debugging this issue. > >>> > >>>> -------- Original Message -------- > >>>> Subject: Re: self-heal not working > >>>> Local Time: August 24, 2017 5:58 AM > >>>> UTC Time: August 24, 2017 3:58 AM > >>>> From: ravishankar@xxxxxxxxxx > >>>> To: mabi [<mabi@xxxxxxxxxxxxx>](mailto:mabi@xxxxxxxxxxxxx) > >>>> Ben Turner [<bturner@xxxxxxxxxx>](mailto:bturner@xxxxxxxxxx), Gluster > >>>> Users [<gluster-users@xxxxxxxxxxx>](mailto:gluster-users@xxxxxxxxxxx) > >>>> > >>>> Unlikely. In your case only the afr.dirty is set, not the > >>>> afr.volname-client-xx xattr. > >>>> > >>>> `gluster volume set myvolume diagnostics.client-log-level DEBUG` is > >>>> right. > >>>> > >>>> On 08/23/2017 10:31 PM, mabi wrote: > >>>> > >>>>> I just saw the following bug which was fixed in 3.8.15: > >>>>> > >>>>> https://bugzilla.redhat.com/show_bug.cgi?id=1471613 > >>>>> > >>>>> Is it possible that the problem I described in this post is related to > >>>>> that bug? > >>>>> > >>>>>> -------- Original Message -------- > >>>>>> Subject: Re: self-heal not working > >>>>>> Local Time: August 22, 2017 11:51 AM > >>>>>> UTC Time: August 22, 2017 9:51 AM > >>>>>> From: ravishankar@xxxxxxxxxx > >>>>>> To: mabi [<mabi@xxxxxxxxxxxxx>](mailto:mabi@xxxxxxxxxxxxx) > >>>>>> Ben Turner [<bturner@xxxxxxxxxx>](mailto:bturner@xxxxxxxxxx), Gluster > >>>>>> Users [<gluster-users@xxxxxxxxxxx>](mailto:gluster-users@xxxxxxxxxxx) > >>>>>> > >>>>>> On 08/22/2017 02:30 PM, mabi wrote: > >>>>>> > >>>>>>> Thanks for the additional hints, I have the following 2 questions > >>>>>>> first: > >>>>>>> > >>>>>>> - In order to launch the index heal is the following command correct: > >>>>>>> gluster volume heal myvolume > >>>>>> > >>>>>> Yes > >>>>>> > >>>>>>> - If I run a "volume start force" will it have any short disruptions > >>>>>>> on my clients which mount the volume through FUSE? If yes, how long? > >>>>>>> This is a production system that's why I am asking. > >>>>>> > >>>>>> No. You can actually create a test volume on your personal linux box > >>>>>> to try these kinds of things without needing multiple machines. This > >>>>>> is how we develop and test our patches :) > >>>>>> 'gluster volume create testvol replica 3 /home/mabi/bricks/brick{1..3} > >>>>>> force` and so on. > >>>>>> > >>>>>> HTH, > >>>>>> Ravi > >>>>>> > >>>>>>>> -------- Original Message -------- > >>>>>>>> Subject: Re: self-heal not working > >>>>>>>> Local Time: August 22, 2017 6:26 AM > >>>>>>>> UTC Time: August 22, 2017 4:26 AM > >>>>>>>> From: ravishankar@xxxxxxxxxx > >>>>>>>> To: mabi [<mabi@xxxxxxxxxxxxx>](mailto:mabi@xxxxxxxxxxxxx), Ben > >>>>>>>> Turner [<bturner@xxxxxxxxxx>](mailto:bturner@xxxxxxxxxx) > >>>>>>>> Gluster Users > >>>>>>>> [<gluster-users@xxxxxxxxxxx>](mailto:gluster-users@xxxxxxxxxxx) > >>>>>>>> > >>>>>>>> Explore the following: > >>>>>>>> > >>>>>>>> - Launch index heal and look at the glustershd logs of all bricks > >>>>>>>> for possible errors > >>>>>>>> > >>>>>>>> - See if the glustershd in each node is connected to all bricks. > >>>>>>>> > >>>>>>>> - If not try to restart shd by `volume start force` > >>>>>>>> > >>>>>>>> - Launch index heal again and try. > >>>>>>>> > >>>>>>>> - Try debugging the shd log by setting client-log-level to DEBUG > >>>>>>>> temporarily. > >>>>>>>> > >>>>>>>> On 08/22/2017 03:19 AM, mabi wrote: > >>>>>>>> > >>>>>>>>> Sure, it doesn't look like a split brain based on the output: > >>>>>>>>> > >>>>>>>>> Brick node1.domain.tld:/data/myvolume/brick > >>>>>>>>> Status: Connected > >>>>>>>>> Number of entries in split-brain: 0 > >>>>>>>>> > >>>>>>>>> Brick node2.domain.tld:/data/myvolume/brick > >>>>>>>>> Status: Connected > >>>>>>>>> Number of entries in split-brain: 0 > >>>>>>>>> > >>>>>>>>> Brick node3.domain.tld:/srv/glusterfs/myvolume/brick > >>>>>>>>> Status: Connected > >>>>>>>>> Number of entries in split-brain: 0 > >>>>>>>>> > >>>>>>>>>> -------- Original Message -------- > >>>>>>>>>> Subject: Re: self-heal not working > >>>>>>>>>> Local Time: August 21, 2017 11:35 PM > >>>>>>>>>> UTC Time: August 21, 2017 9:35 PM > >>>>>>>>>> From: bturner@xxxxxxxxxx > >>>>>>>>>> To: mabi [<mabi@xxxxxxxxxxxxx>](mailto:mabi@xxxxxxxxxxxxx) > >>>>>>>>>> Gluster Users > >>>>>>>>>> [<gluster-users@xxxxxxxxxxx>](mailto:gluster-users@xxxxxxxxxxx) > >>>>>>>>>> > >>>>>>>>>> Can you also provide: > >>>>>>>>>> > >>>>>>>>>> gluster v heal <my vol> info split-brain > >>>>>>>>>> > >>>>>>>>>> If it is split brain just delete the incorrect file from the brick > >>>>>>>>>> and run heal again. I haven"t tried this with arbiter but I > >>>>>>>>>> assume the process is the same. > >>>>>>>>>> > >>>>>>>>>> -b > >>>>>>>>>> > >>>>>>>>>> ----- Original Message ----- > >>>>>>>>>>> From: "mabi" [<mabi@xxxxxxxxxxxxx>](mailto:mabi@xxxxxxxxxxxxx) > >>>>>>>>>>> To: "Ben Turner" > >>>>>>>>>>> [<bturner@xxxxxxxxxx>](mailto:bturner@xxxxxxxxxx) > >>>>>>>>>>> Cc: "Gluster Users" > >>>>>>>>>>> [<gluster-users@xxxxxxxxxxx>](mailto:gluster-users@xxxxxxxxxxx) > >>>>>>>>>>> Sent: Monday, August 21, 2017 4:55:59 PM > >>>>>>>>>>> Subject: Re: self-heal not working > >>>>>>>>>>> > >>>>>>>>>>> Hi Ben, > >>>>>>>>>>> > >>>>>>>>>>> So it is really a 0 kBytes file everywhere (all nodes including > >>>>>>>>>>> the arbiter > >>>>>>>>>>> and from the client). > >>>>>>>>>>> Here below you will find the output you requested. Hopefully that > >>>>>>>>>>> will help > >>>>>>>>>>> to find out why this specific file is not healing... Let me know > >>>>>>>>>>> if you need > >>>>>>>>>>> any more information. Btw node3 is my arbiter node. > >>>>>>>>>>> > >>>>>>>>>>> NODE1: > >>>>>>>>>>> > >>>>>>>>>>> STAT: > >>>>>>>>>>> File: > >>>>>>>>>>> ‘/data/myvolume/brick/data/appdata_ocpom4nckwru/preview/1344699/64-64-crop.png’ > >>>>>>>>>>> Size: 0 Blocks: 38 IO Block: 131072 regular empty file > >>>>>>>>>>> Device: 24h/36d Inode: 10033884 Links: 2 > >>>>>>>>>>> Access: (0644/-rw-r--r--) Uid: ( 33/www-data) Gid: ( 33/www-data) > >>>>>>>>>>> Access: 2017-08-14 17:04:55.530681000 +0200 > >>>>>>>>>>> Modify: 2017-08-14 17:11:46.407404779 +0200 > >>>>>>>>>>> Change: 2017-08-14 17:11:46.407404779 +0200 > >>>>>>>>>>> Birth: - > >>>>>>>>>>> > >>>>>>>>>>> GETFATTR: > >>>>>>>>>>> trusted.afr.dirty=0sAAAAAQAAAAAAAAAA > >>>>>>>>>>> trusted.bit-rot.version=0sAgAAAAAAAABZhuknAAlJAg== > >>>>>>>>>>> trusted.gfid=0sGYXiM9XuTj6lGs8LX58q6g== > >>>>>>>>>>> trusted.glusterfs.d99af2fa-439b-4a21-bf3a-38f3849f87ec.xtime=0sWZG9sgAGOyo= > >>>>>>>>>>> > >>>>>>>>>>> NODE2: > >>>>>>>>>>> > >>>>>>>>>>> STAT: > >>>>>>>>>>> File: > >>>>>>>>>>> ‘/data/myvolume/brick/data/appdata_ocpom4nckwru/preview/1344699/64-64-crop.png’ > >>>>>>>>>>> Size: 0 Blocks: 38 IO Block: 131072 regular empty file > >>>>>>>>>>> Device: 26h/38d Inode: 10031330 Links: 2 > >>>>>>>>>>> Access: (0644/-rw-r--r--) Uid: ( 33/www-data) Gid: ( 33/www-data) > >>>>>>>>>>> Access: 2017-08-14 17:04:55.530681000 +0200 > >>>>>>>>>>> Modify: 2017-08-14 17:11:46.403704181 +0200 > >>>>>>>>>>> Change: 2017-08-14 17:11:46.403704181 +0200 > >>>>>>>>>>> Birth: - > >>>>>>>>>>> > >>>>>>>>>>> GETFATTR: > >>>>>>>>>>> trusted.afr.dirty=0sAAAAAQAAAAAAAAAA > >>>>>>>>>>> trusted.bit-rot.version=0sAgAAAAAAAABZhu6wAA8Hpw== > >>>>>>>>>>> trusted.gfid=0sGYXiM9XuTj6lGs8LX58q6g== > >>>>>>>>>>> trusted.glusterfs.d99af2fa-439b-4a21-bf3a-38f3849f87ec.xtime=0sWZG9sgAGOVE= > >>>>>>>>>>> > >>>>>>>>>>> NODE3: > >>>>>>>>>>> STAT: > >>>>>>>>>>> File: > >>>>>>>>>>> /srv/glusterfs/myvolume/brick/data/appdata_ocpom4nckwru/preview/1344699/64-64-crop.png > >>>>>>>>>>> Size: 0 Blocks: 0 IO Block: 4096 regular empty file > >>>>>>>>>>> Device: ca11h/51729d Inode: 405208959 Links: 2 > >>>>>>>>>>> Access: (0644/-rw-r--r--) Uid: ( 33/www-data) Gid: ( 33/www-data) > >>>>>>>>>>> Access: 2017-08-14 17:04:55.530681000 +0200 > >>>>>>>>>>> Modify: 2017-08-14 17:04:55.530681000 +0200 > >>>>>>>>>>> Change: 2017-08-14 17:11:46.604380051 +0200 > >>>>>>>>>>> Birth: - > >>>>>>>>>>> > >>>>>>>>>>> GETFATTR: > >>>>>>>>>>> trusted.afr.dirty=0sAAAAAQAAAAAAAAAA > >>>>>>>>>>> trusted.bit-rot.version=0sAgAAAAAAAABZe6ejAAKPAg== > >>>>>>>>>>> trusted.gfid=0sGYXiM9XuTj6lGs8LX58q6g== > >>>>>>>>>>> trusted.glusterfs.d99af2fa-439b-4a21-bf3a-38f3849f87ec.xtime=0sWZG9sgAGOc4= > >>>>>>>>>>> > >>>>>>>>>>> CLIENT GLUSTER MOUNT: > >>>>>>>>>>> STAT: > >>>>>>>>>>> File: > >>>>>>>>>>> "/mnt/myvolume/data/appdata_ocpom4nckwru/preview/1344699/64-64-crop.png" > >>>>>>>>>>> Size: 0 Blocks: 0 IO Block: 131072 regular empty file > >>>>>>>>>>> Device: 1eh/30d Inode: 11897049013408443114 Links: 1 > >>>>>>>>>>> Access: (0644/-rw-r--r--) Uid: ( 33/www-data) Gid: ( 33/www-data) > >>>>>>>>>>> Access: 2017-08-14 17:04:55.530681000 +0200 > >>>>>>>>>>> Modify: 2017-08-14 17:11:46.407404779 +0200 > >>>>>>>>>>> Change: 2017-08-14 17:11:46.407404779 +0200 > >>>>>>>>>>> Birth: - > >>>>>>>>>>> > >>>>>>>>>>> > -------- Original Message -------- > >>>>>>>>>>> > Subject: Re: self-heal not working > >>>>>>>>>>> > Local Time: August 21, 2017 9:34 PM > >>>>>>>>>>> > UTC Time: August 21, 2017 7:34 PM > >>>>>>>>>>> > From: bturner@xxxxxxxxxx > >>>>>>>>>>> > To: mabi [<mabi@xxxxxxxxxxxxx>](mailto:mabi@xxxxxxxxxxxxx) > >>>>>>>>>>> > Gluster Users > >>>>>>>>>>> > [<gluster-users@xxxxxxxxxxx>](mailto:gluster-users@xxxxxxxxxxx) > >>>>>>>>>>> > > >>>>>>>>>>> > ----- Original Message ----- > >>>>>>>>>>> >> From: "mabi" [<mabi@xxxxxxxxxxxxx>](mailto:mabi@xxxxxxxxxxxxx) > >>>>>>>>>>> >> To: "Gluster Users" > >>>>>>>>>>> >> [<gluster-users@xxxxxxxxxxx>](mailto:gluster-users@xxxxxxxxxxx) > >>>>>>>>>>> >> Sent: Monday, August 21, 2017 9:28:24 AM > >>>>>>>>>>> >> Subject: self-heal not working > >>>>>>>>>>> >> > >>>>>>>>>>> >> Hi, > >>>>>>>>>>> >> > >>>>>>>>>>> >> I have a replicat 2 with arbiter GlusterFS 3.8.11 cluster and > >>>>>>>>>>> >> there is > >>>>>>>>>>> >> currently one file listed to be healed as you can see below > >>>>>>>>>>> >> but never gets > >>>>>>>>>>> >> healed by the self-heal daemon: > >>>>>>>>>>> >> > >>>>>>>>>>> >> Brick node1.domain.tld:/data/myvolume/brick > >>>>>>>>>>> >> /data/appdata_ocpom4nckwru/preview/1344699/64-64-crop.png > >>>>>>>>>>> >> Status: Connected > >>>>>>>>>>> >> Number of entries: 1 > >>>>>>>>>>> >> > >>>>>>>>>>> >> Brick node2.domain.tld:/data/myvolume/brick > >>>>>>>>>>> >> /data/appdata_ocpom4nckwru/preview/1344699/64-64-crop.png > >>>>>>>>>>> >> Status: Connected > >>>>>>>>>>> >> Number of entries: 1 > >>>>>>>>>>> >> > >>>>>>>>>>> >> Brick node3.domain.tld:/srv/glusterfs/myvolume/brick > >>>>>>>>>>> >> /data/appdata_ocpom4nckwru/preview/1344699/64-64-crop.png > >>>>>>>>>>> >> Status: Connected > >>>>>>>>>>> >> Number of entries: 1 > >>>>>>>>>>> >> > >>>>>>>>>>> >> As once recommended on this mailing list I have mounted that > >>>>>>>>>>> >> glusterfs > >>>>>>>>>>> >> volume > >>>>>>>>>>> >> temporarily through fuse/glusterfs and ran a "stat" on that > >>>>>>>>>>> >> file which is > >>>>>>>>>>> >> listed above but nothing happened. > >>>>>>>>>>> >> > >>>>>>>>>>> >> The file itself is available on all 3 nodes/bricks but on the > >>>>>>>>>>> >> last node it > >>>>>>>>>>> >> has a different date. By the way this file is 0 kBytes big. Is > >>>>>>>>>>> >> that maybe > >>>>>>>>>>> >> the reason why the self-heal does not work? > >>>>>>>>>>> > > >>>>>>>>>>> > Is the file actually 0 bytes or is it just 0 bytes on the > >>>>>>>>>>> > arbiter(0 bytes > >>>>>>>>>>> > are expected on the arbiter, it just stores metadata)? Can you > >>>>>>>>>>> > send us the > >>>>>>>>>>> > output from stat on all 3 nodes: > >>>>>>>>>>> > > >>>>>>>>>>> > $ stat <file on back end brick> > >>>>>>>>>>> > $ getfattr -d -m - <file on back end brick> > >>>>>>>>>>> > $ stat <file from gluster mount> > >>>>>>>>>>> > > >>>>>>>>>>> > Lets see what things look like on the back end, it should tell > >>>>>>>>>>> > us why > >>>>>>>>>>> > healing is failing. > >>>>>>>>>>> > > >>>>>>>>>>> > -b > >>>>>>>>>>> > > >>>>>>>>>>> >> > >>>>>>>>>>> >> And how can I now make this file to heal? > >>>>>>>>>>> >> > >>>>>>>>>>> >> Thanks, > >>>>>>>>>>> >> Mabi > >>>>>>>>>>> >> > >>>>>>>>>>> >> > >>>>>>>>>>> >> > >>>>>>>>>>> >> > >>>>>>>>>>> >> _______________________________________________ > >>>>>>>>>>> >> Gluster-users mailing list > >>>>>>>>>>> >> Gluster-users@xxxxxxxxxxx > >>>>>>>>>>> >> http://lists.gluster.org/mailman/listinfo/gluster-users > >>>>>>>>> > >>>>>>>>> _______________________________________________ > >>>>>>>>> Gluster-users mailing list > >>>>>>>>> Gluster-users@xxxxxxxxxxx > >>>>>>>>> > >>>>>>>>> http://lists.gluster.org/mailman/listinfo/gluster-users _______________________________________________ Gluster-users mailing list Gluster-users@xxxxxxxxxxx http://lists.gluster.org/mailman/listinfo/gluster-users