Mysterious Disk-Space Eater

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi All,

Got a funny one, which I'm hoping someone can help us with.

We've got three identical(?) Ceph Quincy Nodes running on Rocky Linux 8.7. Each Node has 4 OSDs, plus Monitor, Manager, and iSCSI G/W services running on them (we're only a small shop). Each Node has a separate 16 GiB partition mounted as /var. Everything is running well and the Ceph Cluster is handling things very well).

However, one of the Nodes (not the one currently acting as the Active Manager) is running out of space on /var. Normally, all of the Nodes have around 10% space used (via a df -H command), but the problem Node only takes 1 to 3 days to run out of space, hence taking it out of Quorum. Its currently at 85% and growing.

At first we thought this was caused by an overly large log file, but investigations showed that all the logs on all 3 Nodes were of comparable size. Also, searching for the 20 largest files on the problem Node's /var didn't produce any significant results.

Coincidentally, unrelated to this issue, the problem Node (but not the other 2 Nodes) was re-booted a couple of days ago and, when the Cluster had re-balanced itself and everything was back online and reporting as Healthy, the problem Node's /var was back down to around 10%, the same as the other two Nodes.

This lead us to suspect that there was some sort of "run-away" process or journaling/logging/temporary file(s) or whatever that the re-boot has "cleaned up". So we've been keeping an eye on things but we can't see anything causing the issue and now, as I said above, the problem Node's /var is back up to 85% and growing.

I've been looking at the log files, tying to determine the issue, but as I don't really know what I'm looking for I don't even know if I'm looking in the *correct* log files...

Obviously rebooting the problem Node every couple of days is not a viable option, and increasing the size of the /var partition is only going to postpone the issue, not resolve it. So if anyone has any ideas we'd love to hear about it - thanks

Cheers

Dulux-Oz
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux