Hi all,
maybe I should add some more information:
The container which filled up the space was running on node x, which
still shows a nearly filled fs:
192.168.1.x:/gvol 2.6T 2.5T 149G 95% /gluster
nearly the same situation on the underlying brick partition on node x:
zdata/brick 2.6T 2.4T 176G 94% /zbrick
On node y the network card crashed, glusterfs shows the same values:
192.168.1.y:/gvol 2.6T 2.5T 149G 95% /gluster
but different values on the brick:
zdata/brick 2.9T 1.6T 1.4T 54% /zbrick
I think this happened because glusterfs still has hardlinks to the
deleted files on node x? So I can find these files with:
find /zbrick/.glusterfs -links 1 -ls | grep -v ' -> '
But now I am lost. How can I verify these files really belongs to the
right container? Or can I just delete this files because there is no way
to access it? Or offers glusterfs a way to solve this situation?
Mathias
On 05.08.20 15:48, Mathias Waack wrote:
Hi all,
we are running a gluster setup with two nodes:
Status of volume: gvol
Gluster process TCP Port RDMA Port
Online Pid
------------------------------------------------------------------------------
Brick 192.168.1.x:/zbrick 49152 0 Y 13350
Brick 192.168.1.y:/zbrick 49152 0 Y 5965
Self-heal Daemon on localhost N/A N/A Y 14188
Self-heal Daemon on 192.168.1.93 N/A N/A Y 6003
Task Status of Volume gvol
------------------------------------------------------------------------------
There are no active volume tasks
The glusterfs hosts a bunch of containers with its data volumes. The
underlying fs is zfs. Few days ago one of the containers created a lot
of files in one of its data volumes, and at the end it completely
filled up the space of the glusterfs volume. But this happened only on
one host, on the other host there was still enough space. We finally
were able to identify this container and found out, the sizes of the
data on /zbrick were different on both hosts for this container. Now
we made the big mistake to delete these files on both hosts in the
/zbrick volume, not on the mounted glusterfs volume.
Later we found the reason for this behavior: the network driver on the
second node partially crashed (which means we ware able to login on
the node, so we assumed the network was running, but the card was
already dropping packets at this time) at the same time, as the failed
container started to fill up the gluster volume. After rebooting the
second node the gluster became available again.
Now the glusterfs volume is running again- but it is still (nearly)
full: the files created by the container are not visible, but they
still count into amount of free space. How can we fix this?
In addition there are some files which are no longer accessible since
this accident:
tail access.log.old
tail: cannot open 'access.log.old' for reading: Input/output error
Looks like affected by this error are files which have been changed
during the accident. Is there a way to fix this too?
Thanks
Mathias
________
Community Meeting Calendar:
Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://bluejeans.com/441850968
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
https://lists.gluster.org/mailman/listinfo/gluster-users
________
Community Meeting Calendar:
Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://bluejeans.com/441850968
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
https://lists.gluster.org/mailman/listinfo/gluster-users