Hi All,
Got a funny one, which I'm hoping someone can help us with.
We've got three identical(?) Ceph Quincy Nodes running on Rocky Linux
8.7. Each Node has 4 OSDs, plus Monitor, Manager, and iSCSI G/W services
running on them (we're only a small shop). Each Node has a separate 16
GiB partition mounted as /var. Everything is running well and the Ceph
Cluster is handling things very well).
However, one of the Nodes (not the one currently acting as the Active
Manager) is running out of space on /var. Normally, all of the Nodes
have around 10% space used (via a df -H command), but the problem Node
only takes 1 to 3 days to run out of space, hence taking it out of
Quorum. Its currently at 85% and growing.
At first we thought this was caused by an overly large log file, but
investigations showed that all the logs on all 3 Nodes were of
comparable size. Also, searching for the 20 largest files on the problem
Node's /var didn't produce any significant results.
Coincidentally, unrelated to this issue, the problem Node (but not the
other 2 Nodes) was re-booted a couple of days ago and, when the Cluster
had re-balanced itself and everything was back online and reporting as
Healthy, the problem Node's /var was back down to around 10%, the same
as the other two Nodes.
This lead us to suspect that there was some sort of "run-away" process
or journaling/logging/temporary file(s) or whatever that the re-boot has
"cleaned up". So we've been keeping an eye on things but we can't see
anything causing the issue and now, as I said above, the problem Node's
/var is back up to 85% and growing.
I've been looking at the log files, tying to determine the issue, but as
I don't really know what I'm looking for I don't even know if I'm
looking in the *correct* log files...
Obviously rebooting the problem Node every couple of days is not a
viable option, and increasing the size of the /var partition is only
going to postpone the issue, not resolve it. So if anyone has any ideas
we'd love to hear about it - thanks
Cheers
Dulux-Oz
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx