Re: Pending heal status when deleting files which are marked as to be healed

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Additional information,

After the volume was 100% full, I delete some of the files but not the files which are listed in heal info. When it was 98%, I delete the folder which was marked as to be healed: /archive1/data/fff

After start and stop the volume the files in /archive1/data/fff were still there.

Regards
David Spisla



Am Mo., 24. Juni 2019 um 15:33 Uhr schrieb David Spisla <spisla80@xxxxxxxxx>:
Hello Ravi and Gluster Community,

Am Mo., 24. Juni 2019 um 14:25 Uhr schrieb David Spisla <spisla80@xxxxxxxxx>:


---------- Forwarded message ---------
Von: David Spisla <spisla80@xxxxxxxxx>
Date: Fr., 21. Juni 2019 um 10:02 Uhr
Subject: Re: Pending heal status when deleting files which are marked as to be healed
To: Ravishankar N <ravishankar@xxxxxxxxxx>


Hello Ravi,

Am Mi., 19. Juni 2019 um 18:06 Uhr schrieb Ravishankar N <ravishankar@xxxxxxxxxx>:


On 17/06/19 3:45 PM, David Spisla wrote:
Hello Gluster Community,

my newest observation concerns the self heal daemon:
Scenario: 2 Node Gluster v5.5 Cluster with Replica 2 Volume. Just one brick per node. Access via SMB Client from a Win10 machine

How to reproduce:
I have created a small folder with a lot of small files and I copied that folder recursively into itself for a few times. Additionally I copied three big folders with a lot of content into the root of the volume.
Note: There was no node down or something else like brick down, etc.. So the whole volume was accessible.

Because of the recursively copy action all this copied files whre listed as to be healed (via gluster heal info).

This is odd. How did you conclude that writing to the volume (i.e. recursive copy) was the reason for the files to be needing heal? Did you check if there were any gluster messages about disconnects in the smb client logs?

There was no disconnection, I am sure. But at all I am not really sure whats the cause of this problem.
I reproduce it. Now I don't think that recursive copy is the reason. I copied several small files in the volume (capacity 1GB) unless it is full (see steps to reproduce below). I didn't set RO to the file. There was never a disconnection.


Now I set some of the effected files ReadOnly (they get WORMed because worm-file-level is enabled). After this I tried to delete the parent folder of that files.

Expected: All files should be healed
Actually: All files, which are Read-Only, are not healed. heal info shows permanently that this files has to be healed.
Does disabling read-only let the files to be healed?
I have to ty this.
I tried it out and it had no efffect. 

glustershd log throws error and brick log (with level DEBUG) permanently throws a lot of messages which I don't understand. See the attached file which contains all informations, also heal info and volume info, beside the logs

Maybe some of you know whats going on there? Since we can reproduce this scenario, we can give more debug information if needed.

Is it possible to  script the list of steps to reproduce this issue?

I will do that and post it here. Although I will collect more data when it happens
Steps to reproduce:

1. Copy several small files into a volume (here: 1GB capacity)
2. Copy until the volume is nearly full (70-80% or more)
3. Now self-heal is listing files to be healed
4. Move or delete all of this files or a just a part.
5. The files won't be healed and stay in the heal info list.

In my case I copied until the volume was 100% full (storage.reserve was 1%). I delete some of the files, to get a level of 98%. I wait for a while but nothing happens. After this I stopped and started the volume. Files are now healed.
Attached there is the glustershd.log where you can see that performing entry.self-heal (2019-06-24 10:04:02.007328) could not be finished for pgfid:7e4fa649-434a-4bb7-a1c2-258818d76076 until the volume was stopped and started again. After starting again entry.self-heal could be finished for that pgfid (at 2019-06-24 12:38:38.689632). The pgfid refers to the files which were listed to be healed:

fs-davids-c2-n1:~ # gluster vo heal archive1 info
Brick fs-davids-c2-n1:/gluster/brick1/glusterbrick
/archive1/data/fff/gg - Kopie.txt
/archive1/data/fff
/archive1/data/fff/gg - Kopie - Kopie.txt
/archive1/data/fff/gg - Kopie - Kopie (2).txt
Status: Connected
Number of entries: 4

All of this files has the same pgfid:

fs-davids-c2-n1:~ # getfattr -e hex -d -m "" '/gluster/brick1/glusterbrick/archive1/data/fff/'* | grep trusted.pgfid
getfattr: Removing leading '/' from absolute path names
trusted.pgfid.7e4fa649-434a-4bb7-a1c2-258818d76076=0x00000001
trusted.pgfid.7e4fa649-434a-4bb7-a1c2-258818d76076=0x00000001
trusted.pgfid.7e4fa649-434a-4bb7-a1c2-258818d76076=0x00000001

Summary: The pending heal problem seems to occur if a volume is nearly full or completely full.

Regards
David Spisla


Regards
David

Regards,

Ravi


Regards
David Spisla


_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
https://lists.gluster.org/mailman/listinfo/gluster-users
_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
https://lists.gluster.org/mailman/listinfo/gluster-users

[Index of Archives]     [Gluster Development]     [Linux Filesytems Development]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux