One can even remove the log and tell the daemon to reopen it without having to restart. I’ve had mons do enough weird things on me that I try to avoid restarting them. ymmv. It’s possible that the OP has a large file that’s unlinked but still open, historically “fsck -n” would find these, today that would depend on the filesystem in use. It’s also possible that there is data under a mountpoint directory within /var, that’s masked by the overlaid mount. http://cephnotes.ksperis.com/blog/2017/01/20/change-log-level-on-the-fly-to-ceph-daemons/; > On Jan 12, 2023, at 4:04 AM, E Taka <0etaka0@xxxxxxxxx> wrote: > > We had a similar problem, and it was a (visible) logfile. It is easy to > find with the ncdu utility (`ncdu -x /var`). There's no need of a reboot, > you can get rid of it with restarting the Monitor with `ceph orch daemon > restart mon.NODENAME`. You may also lower the debug level. > > Am Do., 12. Jan. 2023 um 09:14 Uhr schrieb Eneko Lacunza <elacunza@xxxxxxxxx >> : > >> Hi, >> >> El 12/1/23 a las 3:59, duluxoz escribió: >>> Got a funny one, which I'm hoping someone can help us with. >>> >>> We've got three identical(?) Ceph Quincy Nodes running on Rocky Linux >>> 8.7. Each Node has 4 OSDs, plus Monitor, Manager, and iSCSI G/W >>> services running on them (we're only a small shop). Each Node has a >>> separate 16 GiB partition mounted as /var. Everything is running well >>> and the Ceph Cluster is handling things very well). >>> >>> However, one of the Nodes (not the one currently acting as the Active >>> Manager) is running out of space on /var. Normally, all of the Nodes >>> have around 10% space used (via a df -H command), but the problem Node >>> only takes 1 to 3 days to run out of space, hence taking it out of >>> Quorum. Its currently at 85% and growing. >>> >>> At first we thought this was caused by an overly large log file, but >>> investigations showed that all the logs on all 3 Nodes were of >>> comparable size. Also, searching for the 20 largest files on the >>> problem Node's /var didn't produce any significant results. >>> >>> Coincidentally, unrelated to this issue, the problem Node (but not the >>> other 2 Nodes) was re-booted a couple of days ago and, when the >>> Cluster had re-balanced itself and everything was back online and >>> reporting as Healthy, the problem Node's /var was back down to around >>> 10%, the same as the other two Nodes. >>> >>> This lead us to suspect that there was some sort of "run-away" process >>> or journaling/logging/temporary file(s) or whatever that the re-boot >>> has "cleaned up". So we've been keeping an eye on things but we can't >>> see anything causing the issue and now, as I said above, the problem >>> Node's /var is back up to 85% and growing. >>> >>> I've been looking at the log files, tying to determine the issue, but >>> as I don't really know what I'm looking for I don't even know if I'm >>> looking in the *correct* log files... >>> >>> Obviously rebooting the problem Node every couple of days is not a >>> viable option, and increasing the size of the /var partition is only >>> going to postpone the issue, not resolve it. So if anyone has any >>> ideas we'd love to hear about it - thanks >> >> This seems one or more files that are removed but some process has their >> handle open (and maybe is still writing...). When rebooting process is >> terminated and file(s) effectively removed. >> >> Try to inspect each process' open files and find what file(s) have no >> longer a directory entry... that would give you a hint. >> >> Cheers >> >> >> Eneko Lacunza >> Zuzendari teknikoa | Director técnico >> Binovo IT Human Project >> >> Tel. +34 943 569 206 |https://www.binovo.es >> Astigarragako Bidea, 2 - 2º izda. Oficina 10-11, 20180 Oiartzun >> >> https://www.youtube.com/user/CANALBINOVO >> https://www.linkedin.com/company/37269706/ >> _______________________________________________ >> ceph-users mailing list -- ceph-users@xxxxxxx >> To unsubscribe send an email to ceph-users-leave@xxxxxxx >> > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx