Re: Question about "Possibly undergoing heal" on a file being reported.

Joe Julian <joe@xxxxxxxxxxxxxxxx> · Thu, 05 May 2016 22:30:23 -0700

Be sure that the "gluster --print-statedumpdir" directory exists first.

On May 5, 2016 10:04:41 PM PDT, Ravishankar N <ravishankar@xxxxxxxxxx> wrote:
Thanks for the response. The healinfo outputs  'Possibly undergoing 
heal'  only when the selfheal daemon is performing heal and not when 
there is IO from the mount. Could you provide the state dump of the 2 
bricks (and the mount too if you know from which mount this vm image is 
being accessed)?

The command is `kill -USR1 <pid>` where pid is the process id of the 
brick or fuse mount. The statedump will be saved in `gluster 
--print-statedumpdir`
Wanted to check if there are any stale locks being held on the bricks.

Thanks,
Ravi

On 05/06/2016 01:22 AM, Richard Klein (RSI) wrote:
 I agree there is activity but it's very low I/O based, like updating log files.  It shouldn't be high enough IO to keep it permanently in the "Possibly undergoing healing" state for
days.  But just to make sure, I powered off the VM and there is no activity now at all and the "trusted.afr.dirty" is still changing.  I will leave the VM in a powered off state until tomorrow.  I agree with you that is shouldn't but that is my dilemma.

 Thanks for the insight,

 Richard Klein
 RSI

 -----Original Message-----
 From: gluster-users-bounces@xxxxxxxxxxx [mailto:gluster-users-
 bounces@xxxxxxxxxxx] On Behalf Of Joe Julian
 Sent: Thursday, May 05, 2016 1:44 PM
 To: gluster-users@xxxxxxxxxxx
 Subject: Re:  Question about "Possibly undergoing heal" on a file
 being reported.

 FYI, that's not "no activity". The file is clearly changing. The dirty state flipping
 back and forth between 1 and 0 is a byproduct of writes occurring. The clients
 set the flag, do
the write, then clear the flag.
 My guess is that's why it's only "possibly" undergoing self-heal. The write may
 have still been pending at the moment of the check.

 On 05/05/2016 10:22 AM, Richard Klein (RSI) wrote:
 There are 2 hosts involved and we have a replica value of 2.  The hosts are
 called n1c1cl1 and n1c2cl1.  Below is the info you requested. The file name in
 gluster is "/97f52c71-80bd-4c2b-8e47-3c8c77712687".
 -- >From the n1c1cl1 brick --

 [root@n1c1cl1 ~]# ll -h
 /data/brick0/gv0cl1/97f52c71-80bd-4c2b-8e47-3c8c77712687
 -rwxr--r--. 2 root root 3.7G May  5 12:10
 /data/brick0/gv0cl1/97f52c71-80bd-4c2b-8e47-3c8c77712687

 [root@n1c1cl1 ~]# getfattr
-d -m . -e hex
 /data/brick0/gv0cl1/97f52c71-80bd-4c2b-8e47-3c8c77712687
 getfattr: Removing leading '/' from absolute path names # file:
 data/brick0/gv0cl1/97f52c71-80bd-4c2b-8e47-3c8c77712687

 security.selinux=0x73797374656d5f753a6f626a6563745f723a64656661756c74
 5
 f743a733000
 trusted.afr.dirty=0xe68000000000000000000000
 trusted.bit-rot.version=0x020000000000000057196a8d000e1606
 trusted.gfid=0xb1a49bd1ea01479f9a8277992461e85f

 -- From the n1c2cl1 brick --

 [root@n1c2cl1 ~]# ll -h
 /data/brick0/gv0cl1/97f52c71-80bd-4c2b-8e47-3c8c77712687
 -rwxr--r--. 2 root root 3.7G May  5 12:16
 /data/brick0/gv0cl1/97f52c71-80bd-4c2b-8e47-3c8c77712687

 [root@n1c2cl1 ~]# getfattr -d -m . -e hex
 /data/brick0/gv0cl1/97f52c71-80bd-4c2b-8e47-3c8c77712687
 getfattr:
Removing leading '/' from absolute path names # file:
 data/brick0/gv0cl1/97f52c71-80bd-4c2b-8e47-3c8c77712687

 security.selinux=0x73797374656d5f753a6f626a6563745f723a64656661756c74
 5
 f743a733000
 trusted.afr.dirty=0xd38000000000000000000000
 trusted.bit-rot.version=0x020000000000000057196a8d000e20ae
 trusted.gfid=0xb1a49bd1ea01479f9a8277992461e85f

 --

 The "trusted.afr.dirty" is changing about 2 or 3 times a minute on both files.
 Let me know if you need further info and thanks.
 Richard Klein
 RSI

 From: Ravishankar N [mailto:ravishankar@xxxxxxxxxx]
 Sent: Wednesday, May 04, 2016 8:52 PM
 To: Richard Klein (RSI);
gluster-users@xxxxxxxxxxx
 Subject: Re:  Question about "Possibly undergoing heal" on a
 file being reported.

 On 05/05/2016 01:50 AM, Richard Klein (RSI) wrote:
 First time e-mailer to the group, greetings all.  We are using Gluster 3.7.6 in
 Cloudstack on CentOS7 with KVM.  Gluster is our primary storage.  All is going
 well >but we have a test VM QCOW2 volume that gets stuck in the "Possibly
 undergoing healing".  By stuck I mean it stays in that state for over 24 hrs.  This
 is a test VM >with no activity on it and we have removed the swap file on the
 guest as well thinking that may be causing high I/O.  All the tools show that the

VM is basically idle >with low I/O.  The only way I can clear it up is to power
 the VM off, move the QCOW2 volume from the Gluster mount then back
 (basically remove and recreate it) >then power the VM back on.  Once I do this
 process all is well again but then it happened again on the same volume/file.
 One additional note, I have even powered off the VM completely and the
 QCOW2 file still stays in this state.
 When this happens, can you share the output
of the extended attributes of
 the file in question from all the bricks of the replica in which the file resides?
 `getfattr -d -m . -e hex /path/to/bricks/file-name`

 Also what is the size of this VM image file?

 Thanks,
 Ravi

 Is there a way to stop/abort or force the heal to finish?  Any help with a
 direction would be appreciated.
 Thanks,

 Richard Klein
 RSI

 Gluster-users mailing list
 Gluster-users@xxxxxxxxxxx
 http://www.gluster.org/mailman/listinfo/gluster-users

 Gluster-users mailing list
 Gluster-users@xxxxxxxxxxx
 http://www.gluster.org/mailman/listinfo/gluster-users

 Gluster-users mailing list
 Gluster-users@xxxxxxxxxxx
 http://www.gluster.org/mailman/listinfo/gluster-users

 Gluster-users mailing list
 Gluster-users@xxxxxxxxxxx
 http://www.gluster.org/mailman/listinfo/gluster-users

Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users

-- 

Sent from my Android device with K-9 Mail. Please excuse my brevity._______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users