As a workaround, to reduce the impact of the MDS slowed down by excessive memory consumption, I would suggest installing earlyoom, disabling swap, and configuring earlyoom as follows (usually through /etc/sysconfig/earlyoom, but could be in a different place on your distribution): EARLYOOM_ARGS="-p -r 600 -m 4,4 -s 1,1" On Sat, Aug 31, 2024 at 3:44 PM Sake Ceph <ceph@xxxxxxxxxxx> wrote: > > Ow it got worse after the upgrade to Reef (was running Quincy). With Quincy the memory usage was also a lot of times around 95% and some swap usage, but never exceeding both to the point of crashing. > > Kind regards, > Sake > > Op 31-08-2024 09:15 CEST schreef Alexander Patrakov <patrakov@xxxxxxxxx>: > > > > > > Got it. > > > > However, to narrow down the issue, I suggest that you test whether it > > still exists after the following changes: > > > > 1. Reduce max_mds to 1. > > 2. Do not reduce max_mds to 1, but migrate all clients from a direct > > CephFS mount to NFS. > > > > On Sat, Aug 31, 2024 at 2:55 PM Sake Ceph <ceph@xxxxxxxxxxx> wrote: > > > > > > I was talking about the hosts where the MDS containers are running on. The clients are all RHEL 9. > > > > > > Kind regards, > > > Sake > > > > > > > Op 31-08-2024 08:34 CEST schreef Alexander Patrakov <patrakov@xxxxxxxxx>: > > > > > > > > > > > > Hello Sake, > > > > > > > > The combination of two active MDSs and RHEL8 does ring a bell, and I > > > > have seen this with Quincy, too. However, what's relevant is the > > > > kernel version on the clients. If they run the default 4.18.x kernel > > > > from RHEL8, please either upgrade to the mainline kernel or decrease > > > > max_mds to 1. If they run a modern kernel, then it is something I do > > > > not know about. > > > > > > > > On Sat, Aug 31, 2024 at 1:21 PM Sake Ceph <ceph@xxxxxxxxxxx> wrote: > > > > > > > > > > @Anthony: it's a small virtualized cluster and indeed SWAP shouldn't be used, but this doesn't change the problem. > > > > > > > > > > @Alexander: the problem is in the active nodes, the standby replay don't have issues anymore. > > > > > > > > > > Last night's backup run increased the memory usage to 86% when rsync was running for app2. It dropped to 77,8% when it was done. When the rsync for app4 was running it increased to 84% and dropping to 80%. After a few hours it's now settled on 82%. > > > > > It looks to me the MDS server is caching something forever while it isn't being used.. > > > > > > > > > > The underlying host is running on RHEL 8. Upgrade to RHEL 9 is planned, but hit some issues with automatically upgrading hosts. > > > > > > > > > > Kind regards, > > > > > Sake > > > > > _______________________________________________ > > > > > ceph-users mailing list -- ceph-users@xxxxxxx > > > > > To unsubscribe send an email to ceph-users-leave@xxxxxxx > > > > > > > > > > > > > > > > -- > > > > Alexander Patrakov > > > > > > > > -- > > Alexander Patrakov > > _______________________________________________ > > ceph-users mailing list -- ceph-users@xxxxxxx > > To unsubscribe send an email to ceph-users-leave@xxxxxxx > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx -- Alexander Patrakov _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx