Hi Patrick,
I'm afraid your ceph-post-file logs were lost to the nether. AFAICT, our ceph-post-file storage has been non-functional since the beginning of the lab outage last year. We're looking into it.
I have it here still. Any other way I can send it to you?
Extremely unlikely.
Okay, taking your word for it. But something seems to be stalling journal trimming. We had a similar thing yesterday evening, but at much smaller scale without noticeable pool size increase. I only got an alert that the ceph_mds_log_ev Prometheus metric starting going up again for a single MDS. It grew past 1M events, so I restarted it. I also restarted the other MDS and they all immediately jumped to above 5M events and stayed there. They are, in fact, still there and have decreased only very slightly in the morning. The pool size is totally within a normal range, though, at 290GiB.
So clearly (a) an incredible number of journal events are being logged and (b) trimming is slow or unable to make progress. I'm looking into why but you can help by running the attached script when the problem is occurring so I can investigate. I'll need a tarball of the outputs.
How do I send it to you if not via ceph-post-file?
Also, in the off-chance this is related to the MDS balancer, please disable it since you're using ephemeral pinning: ceph config set mds mds_bal_interval 0
Done. Thanks for your help! Janek -- Bauhaus-Universität Weimar Bauhausstr. 9a, R308 99423 Weimar, Germany Phone: +49 3643 58 3577 www.webis.de _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx