Re: Question about "Possibly undergoing heal" on a file being reported.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Be sure that the "gluster --print-statedumpdir" directory exists first.

On May 5, 2016 10:04:41 PM PDT, Ravishankar N <ravishankar@xxxxxxxxxx> wrote:
Thanks for the response. The healinfo outputs  'Possibly undergoing 
heal' only when the selfheal daemon is performing heal and not when
there is IO from the mount. Could you provide the state dump of the 2
bricks (and the mount too if you know from which mount this vm image is
being accessed)?

The command is `kill -USR1 <pid>` where pid is the process id of the
brick or fuse mount. The statedump will be saved in `gluster
--print-statedumpdir`
Wanted to check if there are any stale locks being held on the bricks.

Thanks,
Ravi

On 05/06/2016 01:22 AM, Richard Klein (RSI) wrote:
I agree there is activity but it's very low I/O based, like updating log files. It shouldn't be high enough IO to keep it permanently in the "Possibly undergoing healing" state for days. But just to make sure, I powered off the VM and there is no activity now at all and the "trusted.afr.dirty" is still changing. I will leave the VM in a powered off state until tomorrow. I agree with you that is shouldn't but that is my dilemma.

Thanks for the insight,

Richard Klein
RSI

-----Original Message-----
From: gluster-users-bounces@xxxxxxxxxxx [mailto:gluster-users-
bounces@xxxxxxxxxxx] On Behalf Of Joe Julian
Sent: Thursday, May 05, 2016 1:44 PM
To: gluster-users@xxxxxxxxxxx
Subject: Re: Question about "Possibly undergoing heal" on a file
being reported.

FYI, that's not "no activity". The file is clearly changing. The dirty state flipping
back and forth between 1 and 0 is a byproduct of writes occurring. The clients
set the flag, do the write, then clear the flag.
My guess is that's why it's only "possibly" undergoing self-heal. The write may
have still been pending at the moment of the check.

On 05/05/2016 10:22 AM, Richard Klein (RSI) wrote:
There are 2 hosts involved and we have a replica value of 2. The hosts are
called n1c1cl1 and n1c2cl1. Below is the info you requested. The file name in
gluster is "/97f52c71-80bd-4c2b-8e47-3c8c77712687".
-- >From the n1c1cl1 brick --

[root@n1c1cl1 ~]# ll -h
/data/brick0/gv0cl1/97f52c71-80bd-4c2b-8e47-3c8c77712687
-rwxr--r--. 2 root root 3.7G May 5 12:10
/data/brick0/gv0cl1/97f52c71-80bd-4c2b-8e47-3c8c77712687

[root@n1c1cl1 ~]# getfattr -d -m . -e hex
/data/brick0/gv0cl1/97f52c71-80bd-4c2b-8e47-3c8c77712687
getfattr: Removing leading '/' from absolute path names # file:
data/brick0/gv0cl1/97f52c71-80bd-4c2b-8e47-3c8c77712687

security.selinux=0x73797374656d5f753a6f626a6563745f723a64656661756c74
5
f743a733000
trusted.afr.dirty=0xe68000000000000000000000
trusted.bit-rot.version=0x020000000000000057196a8d000e1606
trusted.gfid=0xb1a49bd1ea01479f9a8277992461e85f

-- From the n1c2cl1 brick --

[root@n1c2cl1 ~]# ll -h
/data/brick0/gv0cl1/97f52c71-80bd-4c2b-8e47-3c8c77712687
-rwxr--r--. 2 root root 3.7G May 5 12:16
/data/brick0/gv0cl1/97f52c71-80bd-4c2b-8e47-3c8c77712687

[root@n1c2cl1 ~]# getfattr -d -m . -e hex
/data/brick0/gv0cl1/97f52c71-80bd-4c2b-8e47-3c8c77712687
getfattr: Removing leading '/' from absolute path names # file:
data/brick0/gv0cl1/97f52c71-80bd-4c2b-8e47-3c8c77712687

security.selinux=0x73797374656d5f753a6f626a6563745f723a64656661756c74
5
f743a733000
trusted.afr.dirty=0xd38000000000000000000000
trusted.bit-rot.version=0x020000000000000057196a8d000e20ae
trusted.gfid=0xb1a49bd1ea01479f9a8277992461e85f

--

The "trusted.afr.dirty" is changing about 2 or 3 times a minute on both files.
Let me know if you need further info and thanks.
Richard Klein
RSI



From: Ravishankar N [mailto:ravishankar@xxxxxxxxxx]
Sent: Wednesday, May 04, 2016 8:52 PM
To: Richard Klein (RSI); gluster-users@xxxxxxxxxxx
Subject: Re: Question about "Possibly undergoing heal" on a
file being reported.

On 05/05/2016 01:50 AM, Richard Klein (RSI) wrote:
First time e-mailer to the group, greetings all. We are using Gluster 3.7.6 in
Cloudstack on CentOS7 with KVM. Gluster is our primary storage. All is going
well >but we have a test VM QCOW2 volume that gets stuck in the "Possibly
undergoing healing". By stuck I mean it stays in that state for over 24 hrs. This
is a test VM >with no activity on it and we have removed the swap file on the
guest as well thinking that may be causing high I/O. All the tools show that the
VM is basically idle >with low I/O. The only way I can clear it up is to power
the VM off, move the QCOW2 volume from the Gluster mount then back
(basically remove and recreate it) >then power the VM back on. Once I do this
process all is well again but then it happened again on the same volume/file.
One additional note, I have even powered off the VM completely and the
QCOW2 file still stays in this state.
When this happens, can you share the output of the extended attributes of
the file in question from all the bricks of the replica in which the file resides?
`getfattr -d -m . -e hex /path/to/bricks/file-name`

Also what is the size of this VM image file?

Thanks,
Ravi



Is there a way to stop/abort or force the heal to finish? Any help with a
direction would be appreciated.
Thanks,

Richard Klein
RSI





Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users



Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users


Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users


Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users




Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users

--
Sent from my Android device with K-9 Mail. Please excuse my brevity.
_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users

[Index of Archives]     [Gluster Development]     [Linux Filesytems Development]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux