Re: MDS cache always increasing

Alexander Patrakov <patrakov@xxxxxxxxx> · Sat, 31 Aug 2024 19:36:45 +0800



As a workaround, to reduce the impact of the MDS slowed down by
excessive memory consumption, I would suggest installing earlyoom,
disabling swap, and configuring earlyoom as follows (usually through
/etc/sysconfig/earlyoom, but could be in a different place on your
distribution):

EARLYOOM_ARGS="-p -r 600 -m 4,4 -s 1,1"

On Sat, Aug 31, 2024 at 3:44 PM Sake Ceph <ceph@xxxxxxxxxxx> wrote:
>
> Ow it got worse after the upgrade to Reef (was running Quincy). With Quincy the memory usage was also a lot of times around 95% and some swap usage, but never exceeding both to the point of crashing.
>
> Kind regards,
> Sake
> > Op 31-08-2024 09:15 CEST schreef Alexander Patrakov <patrakov@xxxxxxxxx>:
> >
> >
> > Got it.
> >
> > However, to narrow down the issue, I suggest that you test whether it
> > still exists after the following changes:
> >
> > 1. Reduce max_mds to 1.
> > 2. Do not reduce max_mds to 1, but migrate all clients from a direct
> > CephFS mount to NFS.
> >
> > On Sat, Aug 31, 2024 at 2:55 PM Sake Ceph <ceph@xxxxxxxxxxx> wrote:
> > >
> > > I was talking about the hosts where the MDS containers are running on. The clients are all RHEL 9.
> > >
> > > Kind regards,
> > > Sake
> > >
> > > > Op 31-08-2024 08:34 CEST schreef Alexander Patrakov <patrakov@xxxxxxxxx>:
> > > >
> > > >
> > > > Hello Sake,
> > > >
> > > > The combination of two active MDSs and RHEL8 does ring a bell, and I
> > > > have seen this with Quincy, too. However, what's relevant is the
> > > > kernel version on the clients. If they run the default 4.18.x kernel
> > > > from RHEL8, please either upgrade to the mainline kernel or decrease
> > > > max_mds to 1. If they run a modern kernel, then it is something I do
> > > > not know about.
> > > >
> > > > On Sat, Aug 31, 2024 at 1:21 PM Sake Ceph <ceph@xxxxxxxxxxx> wrote:
> > > > >
> > > > > @Anthony: it's a small virtualized cluster and indeed SWAP shouldn't be used, but this doesn't change the problem.
> > > > >
> > > > > @Alexander: the problem is in the active nodes, the standby replay don't have issues anymore.
> > > > >
> > > > > Last night's backup run increased the memory usage to 86% when rsync was running for app2. It dropped to 77,8% when it was done. When the rsync for app4 was running it increased to 84% and dropping to 80%. After a few hours it's now settled on 82%.
> > > > > It looks to me the MDS server is caching something forever while it isn't being used..
> > > > >
> > > > > The underlying host is running on RHEL 8. Upgrade to RHEL 9 is planned, but hit some issues with automatically upgrading hosts.
> > > > >
> > > > > Kind regards,
> > > > > Sake
> > > > > _______________________________________________
> > > > > ceph-users mailing list -- ceph-users@xxxxxxx
> > > > > To unsubscribe send an email to ceph-users-leave@xxxxxxx
> > > >
> > > >
> > > >
> > > > --
> > > > Alexander Patrakov
> >
> >
> >
> > --
> > Alexander Patrakov
> > _______________________________________________
> > ceph-users mailing list -- ceph-users@xxxxxxx
> > To unsubscribe send an email to ceph-users-leave@xxxxxxx
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx


-- 
Alexander Patrakov
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx