Hi Emmanuel, In my experience MDS getting behind on trimming normally happens for one of two reasons. Either your client workload is simply too expensive for your metadata pool OSDs to keep up (and btw some ops are known to be quite expensive such as setting xattrs or deleting files). Or I've seen this during massive exports of subtrees between multi-active MDS. If you're using a single active MDS, you can exclude the 2nd case. So if it's the former, then it would be useful to know exactly how many log segments your MDS is accumulating.. is it going in short bursts then coming back to normal? Or is it stuck at a very high value? Injecting mds_log_max_segments=400000 is indeed a very large unusual amount -- you definitely don't want to leave it like this long term. (And silencing the warning for bursty client IO is better achieved by increasing the mds_log_warn_factor e.g. to 5 or 10.) Cheers, Dan ______________________________ Clyso GmbH | https://www.clyso.com On Fri, May 26, 2023 at 1:29 AM Emmanuel Jaep <emmanuel.jaep@xxxxxxxxx> wrote: > > Hi, > > lately, we have had some issues with our MDSs (Ceph version 16.2.10 > Pacific). > > Part of them are related to MDS being behind on trimming. > > I checked the documentation and found the following information ( > https://docs.ceph.com/en/pacific/cephfs/health-messages/): > > CephFS maintains a metadata journal that is divided into *log segments*. > The length of journal (in number of segments) is controlled by the setting > mds_log_max_segments, and when the number of segments exceeds that setting > the MDS starts writing back metadata so that it can remove (trim) the > oldest segments. If this writeback is happening too slowly, or a software > bug is preventing trimming, then this health message may appear. The > threshold for this message to appear is controlled by the config option > mds_log_warn_factor, the default is 2.0. > > > Some resources on the web (https://www.suse.com/support/kb/doc/?id=000019740) > indicated that a solution would be to change the `mds_log_max_segments`. > Which I did: > ``` > ceph --cluster floki tell mds.* injectargs '--mds_log_max_segments=400000' > ``` > > Of course, the warning disappeared, but I have a feeling that I just hid > the problem. Pushing a value to 400'000 when the default value is 512 is a > lot. > Why is the trimming not taking place? How can I troubleshoot this further? > > Best, > > Emmanuel > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx