It was worse with 1 MDS, therefor we moved to 2 active MDS with directory pinning (so the balancer won't be an issue/make things extra complicated). The number of caps stay for the most part the same, some ups and downs. I would guess it maybe has something to do with caching the accessed directories or files. Because it increases a lot the first time when using rsync and the second time there isn't really an increase of memory usage, only for a little time when the rsync is running and afterwards it drops again. NFS isn't really an option because it adds another hop for the clients :( Second it happens on our Production environment and I won't be making any changes there for a test. Will try to replicate in our staging environment, but that one has a lot less load on it. Kind regards, Sake > Op 31-08-2024 09:15 CEST schreef Alexander Patrakov <patrakov@xxxxxxxxx>: > > > Got it. > > However, to narrow down the issue, I suggest that you test whether it > still exists after the following changes: > > 1. Reduce max_mds to 1. > 2. Do not reduce max_mds to 1, but migrate all clients from a direct > CephFS mount to NFS. > > On Sat, Aug 31, 2024 at 2:55 PM Sake Ceph <ceph@xxxxxxxxxxx> wrote: > > > > I was talking about the hosts where the MDS containers are running on. The clients are all RHEL 9. > > > > Kind regards, > > Sake > > > > > Op 31-08-2024 08:34 CEST schreef Alexander Patrakov <patrakov@xxxxxxxxx>: > > > > > > > > > Hello Sake, > > > > > > The combination of two active MDSs and RHEL8 does ring a bell, and I > > > have seen this with Quincy, too. However, what's relevant is the > > > kernel version on the clients. If they run the default 4.18.x kernel > > > from RHEL8, please either upgrade to the mainline kernel or decrease > > > max_mds to 1. If they run a modern kernel, then it is something I do > > > not know about. > > > > > > On Sat, Aug 31, 2024 at 1:21 PM Sake Ceph <ceph@xxxxxxxxxxx> wrote: > > > > > > > > @Anthony: it's a small virtualized cluster and indeed SWAP shouldn't be used, but this doesn't change the problem. > > > > > > > > @Alexander: the problem is in the active nodes, the standby replay don't have issues anymore. > > > > > > > > Last night's backup run increased the memory usage to 86% when rsync was running for app2. It dropped to 77,8% when it was done. When the rsync for app4 was running it increased to 84% and dropping to 80%. After a few hours it's now settled on 82%. > > > > It looks to me the MDS server is caching something forever while it isn't being used.. > > > > > > > > The underlying host is running on RHEL 8. Upgrade to RHEL 9 is planned, but hit some issues with automatically upgrading hosts. > > > > > > > > Kind regards, > > > > Sake > > > > _______________________________________________ > > > > ceph-users mailing list -- ceph-users@xxxxxxx > > > > To unsubscribe send an email to ceph-users-leave@xxxxxxx > > > > > > > > > > > > -- > > > Alexander Patrakov > > > > -- > Alexander Patrakov _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx