Re: Pending heal status when deleting files which are marked as to be healed

David Spisla <spisla80@xxxxxxxxx> · Mon, 24 Jun 2019 15:45:29 +0200

Additional information,

After the volume was 100% full, I delete some of the files but not the files which are listed in heal info. When it was 98%, I delete the folder which was marked as to be healed: 
/archive1/data/fff

After start and stop the volume the files in 
/archive1/data/fff were still there.

Regards
David Spisla

Am Mo., 24. Juni 2019 um 15:33 Uhr schrieb David Spisla <spisla80@xxxxxxxxx>:
Hello Ravi and Gluster Community,

Am Mo., 24. Juni 2019 um 14:25 Uhr schrieb David Spisla <spisla80@xxxxxxxxx>:

---------- Forwarded message ---------
Von: David Spisla <spisla80@xxxxxxxxx>
Date: Fr., 21. Juni 2019 um 10:02 Uhr
Subject: Re:  Pending heal status when deleting files which are marked as to be healed
To: Ravishankar N <ravishankar@xxxxxxxxxx>

Hello Ravi,

Am Mi., 19. Juni 2019 um 18:06 Uhr schrieb Ravishankar N <ravishankar@xxxxxxxxxx>:

    On 17/06/19 3:45 PM, David Spisla
      wrote:

        Hello Gluster Community,

        my newest observation concerns the self heal daemon:
        Scenario: 2 Node Gluster v5.5 Cluster with Replica 2
          Volume. Just one brick per node. Access via SMB Client from a
          Win10 machine

        How to reproduce:
        I have created a small folder with a lot of small files and
          I copied that folder recursively into itself for a few times.
          Additionally I copied three big folders with a lot of content
          into the root of the volume. 

        Note: There was no node down or something else like brick
          down, etc.. So the whole volume was accessible.

        Because of the recursively copy action all this copied
          files whre listed as to be healed (via gluster heal info).

    This is odd. How did you conclude that writing to the volume
      (i.e. recursive copy) was the reason for the files to be needing
      heal? Did you check if there were any gluster messages about
      disconnects in the smb client logs?
There was no disconnection, I am sure. But at all I am not really sure whats the cause of this problem. 
I reproduce it. Now I don't think that recursive copy is the reason. I copied several small files in the volume (capacity 1GB) unless it is full (see steps to reproduce below). I didn't set RO to the file. There was never a disconnection.

         Now I set some of the effected files ReadOnly (they get
          WORMed because worm-file-level is enabled). After this I tried
          to delete the parent folder of that files.

        Expected: All files should be healed
        Actually: All files, which are Read-Only, are not healed.
          heal info shows permanently that this files has to be healed.

    Does disabling read-only let the files to be healed?
I have to ty this. 
I tried it out and it had no efffect.  

        glustershd log throws error and brick log (with level
          DEBUG) permanently throws a lot of messages which I don't
          understand. See the attached file which contains all
          informations, also heal info and volume info, beside the logs

        Maybe some of you know whats going on there? Since we can
          reproduce this scenario, we can give more debug information if
          needed.

    Is it possible to  script the list of steps to reproduce this
      issue? 
I will do that and post it here. Although I will collect more data when it happens
Steps to reproduce:

1. Copy several small files into a volume (here: 1GB capacity)
2. Copy until the volume is nearly full (70-80% or more)
3. Now self-heal is listing files to be healed
4. Move or delete all of this files or a just a part.
5. The files won't be healed and stay in the heal info list.

In my case I copied until the volume was 100% full (storage.reserve was 1%). I delete some of the files, to get a level of 98%. I wait for a while but nothing happens. After this I stopped and started the volume. Files are now healed.
Attached there is the glustershd.log where you can see that performing entry.self-heal (2019-06-24 10:04:02.007328) could not be finished for pgfid:7e4fa649-434a-4bb7-a1c2-258818d76076 until the volume was stopped and started again. After starting again entry.self-heal could be finished for that pgfid (at 2019-06-24 12:38:38.689632). The pgfid refers to the files which were listed to be healed:

fs-davids-c2-n1:~ # gluster vo heal archive1 info
Brick fs-davids-c2-n1:/gluster/brick1/glusterbrick
/archive1/data/fff/gg - Kopie.txt 
/archive1/data/fff 
/archive1/data/fff/gg - Kopie - Kopie.txt 
/archive1/data/fff/gg - Kopie - Kopie (2).txt 
Status: Connected
Number of entries: 4

All of this files has the same pgfid:

fs-davids-c2-n1:~ # getfattr -e hex -d -m "" '/gluster/brick1/glusterbrick/archive1/data/fff/'* | grep trusted.pgfid
getfattr: Removing leading '/' from absolute path names
trusted.pgfid.7e4fa649-434a-4bb7-a1c2-258818d76076=0x00000001
trusted.pgfid.7e4fa649-434a-4bb7-a1c2-258818d76076=0x00000001
trusted.pgfid.7e4fa649-434a-4bb7-a1c2-258818d76076=0x00000001

Summary: The pending heal problem seems to occur if a volume is nearly full or completely full.

Regards
David Spisla

Regards
David

    Regards,
    Ravi

        Regards
        David Spisla

      _______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
https://lists.gluster.org/mailman/listinfo/gluster-users

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
https://lists.gluster.org/mailman/listinfo/gluster-users