If you read my previous email, you will see that i noted that the string IS GFID and not the name of the file :) You can find the name following the procedure at: https://docs.gluster.org/en/latest/Troubleshooting/gfid-to-path/ Of course, that will be slow for all entries in .glusterfs and you will need to create a script to match all gfids to brick path. I guess the fastest way to find the deleted files (As far as I understood they were deleted on the brick directly and entries in .glusterfs were left) is to create a script that: 0.Create a ramfs for the files: findmnt /mnt || mount -t ramfs -o size=128MB - /mnt 1. Get all inodes ionice -c 2 -n 7 nice -n 15 find /full/path/to/brick -type f -exec ls -i {} \; >/mnt/data 2. Get only the inodes: nice -n 15 awk '{print $1}' /mnt/data > /mnt/inode_only 3. Now the fun starts now-> find inodes that are not duplicate: nice -n 15 uniq -u /mnt/inode_only > /mnt/gfid-only 4. Once you have the inodes, you can verify that they do exists only in .gluster dir for i in $(cat /mnt/gfid-only); do ionice -c 2 -n 7 nice -n 15 find /path/to/.glusterfs -inum $i ; echo;echo; done 5. If it's OK -> delete for i in $(cat /mnt/gfid-only); do ionice -c 2 -n 7 nice -n 15 find /path/to/brick -inum $i -delete ; done Last , repeat on all bricks Good luck! P.S.: Consider creating a gluster snapshot before that - just in case... Better safe than sorry. P.S: If you think that you got enough resources, you can remove ionice/nice stuff . They are just to guarantee you won't eat too many resources. Best Regards, Strahil Nikolov На 8 август 2020 г. 18:02:10 GMT+03:00, Mathias Waack <mathias.waack@xxxxxxxxxxxxxxx> написа: >So b53c8e46-068b-4286-94a6-7cf54f711983 is not a gfid? What else is it? > >Mathias > >On 08.08.20 09:00, Strahil Nikolov wrote: >> In glusterfs the long string is called "gfid" and does not represent >the name. >> >> Best Regards, >> Strahil Nikolov >> >> >> >> >> >> >> В петък, 7 август 2020 г., 21:40:11 Гринуич+3, Mathias Waack ><mathias.waack@xxxxxxxxxxxxxxx> написа: >> >> >> >> >> >> Hi Strahil, >> >> but I cannot find these files in the heal info: >> >> find /zbrick/.glusterfs -links 1 -ls | grep -v ' -> ' >> ... >> 7443397 132463 -rw------- 1 999 docker 1073741824 Aug 3 >10:35 >> /zbrick/.glusterfs/b5/3c/b53c8e46-068b-4286-94a6-7cf54f711983 >> >> Now looking for this file in the heal infos: >> >> gluster volume heal gvol info | grep >b53c8e46-068b-4286-94a6-7cf54f711983 >> >> shows nothing. >> >> So I do not know, what I have to heal... >> >> Mathias >> >> On 07.08.20 14:32, Strahil Nikolov wrote: >>> Have you tried to gluster heal and check if the files are back into >their place? >>> >>> I always thought that those hard links are used by the healing >mechanism and if that is true - gluster should restore the files to >their original location and then wiping the correct files from FUSE >will be easy. >>> >>> Best Regards, >>> Strahil Nikolov >>> >>> На 7 август 2020 г. 10:24:38 GMT+03:00, Mathias Waack ><mathias.waack@xxxxxxxxxxxxxxx> написа: >>>> Hi all, >>>> >>>> maybe I should add some more information: >>>> >>>> The container which filled up the space was running on node x, >which >>>> still shows a nearly filled fs: >>>> >>>> 192.168.1.x:/gvol 2.6T 2.5T 149G 95% /gluster >>>> >>>> nearly the same situation on the underlying brick partition on node >x: >>>> >>>> zdata/brick 2.6T 2.4T 176G 94% /zbrick >>>> >>>> On node y the network card crashed, glusterfs shows the same >values: >>>> >>>> 192.168.1.y:/gvol 2.6T 2.5T 149G 95% /gluster >>>> >>>> but different values on the brick: >>>> >>>> zdata/brick 2.9T 1.6T 1.4T 54% /zbrick >>>> >>>> I think this happened because glusterfs still has hardlinks to the >>>> deleted files on node x? So I can find these files with: >>>> >>>> find /zbrick/.glusterfs -links 1 -ls | grep -v ' -> ' >>>> >>>> But now I am lost. How can I verify these files really belongs to >the >>>> right container? Or can I just delete this files because there is >no >>>> way >>>> to access it? Or offers glusterfs a way to solve this situation? >>>> >>>> Mathias >>>> >>>> On 05.08.20 15:48, Mathias Waack wrote: >>>>> Hi all, >>>>> >>>>> we are running a gluster setup with two nodes: >>>>> >>>>> Status of volume: gvol >>>>> Gluster process TCP Port RDMA Port >>>>> Online Pid >>>>> >>>> >------------------------------------------------------------------------------ >>>> >>>>> Brick 192.168.1.x:/zbrick 49152 0 Y 13350 >>>>> Brick 192.168.1.y:/zbrick 49152 0 Y 5965 >>>>> Self-heal Daemon on localhost N/A N/A Y 14188 >>>>> Self-heal Daemon on 192.168.1.93 N/A N/A Y 6003 >>>>> >>>>> Task Status of Volume gvol >>>>> >>>> >------------------------------------------------------------------------------ >>>> >>>>> There are no active volume tasks >>>>> >>>>> The glusterfs hosts a bunch of containers with its data volumes. >The >>>>> underlying fs is zfs. Few days ago one of the containers created a >>>> lot >>>>> of files in one of its data volumes, and at the end it completely >>>>> filled up the space of the glusterfs volume. But this happened >only >>>> on >>>>> one host, on the other host there was still enough space. We >finally >>>>> were able to identify this container and found out, the sizes of >the >>>>> data on /zbrick were different on both hosts for this container. >Now >>>>> we made the big mistake to delete these files on both hosts in the >>>>> /zbrick volume, not on the mounted glusterfs volume. >>>>> >>>>> Later we found the reason for this behavior: the network driver on >>>> the >>>>> second node partially crashed (which means we ware able to login >on >>>>> the node, so we assumed the network was running, but the card was >>>>> already dropping packets at this time) at the same time, as the >>>> failed >>>>> container started to fill up the gluster volume. After rebooting >the >>>>> second node the gluster became available again. >>>>> >>>>> Now the glusterfs volume is running again- but it is still >(nearly) >>>>> full: the files created by the container are not visible, but they >>>>> still count into amount of free space. How can we fix this? >>>>> >>>>> In addition there are some files which are no longer accessible >since >>>>> this accident: >>>>> >>>>> tail access.log.old >>>>> tail: cannot open 'access.log.old' for reading: Input/output error >>>>> >>>>> Looks like affected by this error are files which have been >changed >>>>> during the accident. Is there a way to fix this too? >>>>> >>>>> Thanks >>>>> Mathias >>>>> >>>>> >>>>> ________ >>>>> >>>>> >>>>> >>>>> Community Meeting Calendar: >>>>> >>>>> Schedule - >>>>> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC >>>>> Bridge: https://bluejeans.com/441850968 >>>>> >>>>> Gluster-users mailing list >>>>> Gluster-users@xxxxxxxxxxx >>>>> https://lists.gluster.org/mailman/listinfo/gluster-users >>>> ________ >>>> >>>> >>>> >>>> Community Meeting Calendar: >>>> >>>> Schedule - >>>> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC >>>> Bridge: https://bluejeans.com/441850968 >>>> >>>> Gluster-users mailing list >>>> Gluster-users@xxxxxxxxxxx >>>> https://lists.gluster.org/mailman/listinfo/gluster-users >> ________ >> >> >> >> Community Meeting Calendar: >> >> Schedule - >> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC >> Bridge: https://bluejeans.com/441850968 >> >> Gluster-users mailing list >> Gluster-users@xxxxxxxxxxx >> https://lists.gluster.org/mailman/listinfo/gluster-users >________ > > > >Community Meeting Calendar: > >Schedule - >Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC >Bridge: https://bluejeans.com/441850968 > >Gluster-users mailing list >Gluster-users@xxxxxxxxxxx >https://lists.gluster.org/mailman/listinfo/gluster-users ________ Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://bluejeans.com/441850968 Gluster-users mailing list Gluster-users@xxxxxxxxxxx https://lists.gluster.org/mailman/listinfo/gluster-users