Re: MDS cache always increasing

Sake Ceph <ceph@xxxxxxxxxxx> · Sat, 31 Aug 2024 09:31:45 +0200 (CEST)

It was worse with 1 MDS, therefor we moved to 2 active MDS with directory pinning (so the balancer won't be an issue/make things extra complicated). 

The number of caps stay for the most part the same, some ups and downs. I would guess it maybe has something to do with caching the accessed directories or files. Because it increases a lot the first time when using rsync and the second time there isn't really an increase of memory usage, only for a little time when the rsync is running and afterwards it drops again. 

NFS isn't really an option because it adds another hop for the clients :( Second it happens on our Production environment and I won't be making any changes there for a test.
Will try to replicate in our staging environment, but that one has a lot less load on it. 

Kind regards, 
Sake 
> Op 31-08-2024 09:15 CEST schreef Alexander Patrakov <patrakov@xxxxxxxxx>:
> 
>  
> Got it.
> 
> However, to narrow down the issue, I suggest that you test whether it
> still exists after the following changes:
> 
> 1. Reduce max_mds to 1.
> 2. Do not reduce max_mds to 1, but migrate all clients from a direct
> CephFS mount to NFS.
> 
> On Sat, Aug 31, 2024 at 2:55 PM Sake Ceph <ceph@xxxxxxxxxxx> wrote:
> >
> > I was talking about the hosts where the MDS containers are running on. The clients are all RHEL 9.
> >
> > Kind regards,
> > Sake
> >
> > > Op 31-08-2024 08:34 CEST schreef Alexander Patrakov <patrakov@xxxxxxxxx>:
> > >
> > >
> > > Hello Sake,
> > >
> > > The combination of two active MDSs and RHEL8 does ring a bell, and I
> > > have seen this with Quincy, too. However, what's relevant is the
> > > kernel version on the clients. If they run the default 4.18.x kernel
> > > from RHEL8, please either upgrade to the mainline kernel or decrease
> > > max_mds to 1. If they run a modern kernel, then it is something I do
> > > not know about.
> > >
> > > On Sat, Aug 31, 2024 at 1:21 PM Sake Ceph <ceph@xxxxxxxxxxx> wrote:
> > > >
> > > > @Anthony: it's a small virtualized cluster and indeed SWAP shouldn't be used, but this doesn't change the problem.
> > > >
> > > > @Alexander: the problem is in the active nodes, the standby replay don't have issues anymore.
> > > >
> > > > Last night's backup run increased the memory usage to 86% when rsync was running for app2. It dropped to 77,8% when it was done. When the rsync for app4 was running it increased to 84% and dropping to 80%. After a few hours it's now settled on 82%.
> > > > It looks to me the MDS server is caching something forever while it isn't being used..
> > > >
> > > > The underlying host is running on RHEL 8. Upgrade to RHEL 9 is planned, but hit some issues with automatically upgrading hosts.
> > > >
> > > > Kind regards,
> > > > Sake
> > > > _______________________________________________
> > > > ceph-users mailing list -- ceph-users@xxxxxxx
> > > > To unsubscribe send an email to ceph-users-leave@xxxxxxx
> > >
> > >
> > >
> > > --
> > > Alexander Patrakov
> 
> 
> 
> -- 
> Alexander Patrakov
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx