Re: ceph slow at 80% full, mds nodes lots of unused memory

Patrick Donnelly <pdonnell@xxxxxxxxxx> · Wed, 24 Feb 2021 13:28:34 -0800

Hello Simon,

On Wed, Feb 24, 2021 at 7:43 AM Simon Oosthoek <s.oosthoek@xxxxxxxxxxxxx> wrote:
>
> On 24/02/2021 12:40, Simon Oosthoek wrote:
> > Hi
> >
> > we've been running our Ceph cluster for nearly 2 years now (Nautilus)
> > and recently, due to a temporary situation the cluster is at 80% full.
> >
> > We are only using CephFS on the cluster.
> >
> > Normally, I realize we should be adding OSD nodes, but this is a
> > temporary situation, and I expect the cluster to go to <60% full quite soon.
> >
> > Anyway, we are noticing some really problematic slowdowns. There are
> > some things that could be related but we are unsure...
> >
> > - Our 2 MDS nodes (1 active, 1 standby) are configured with 128GB RAM,
> > but are not using more than 2GB, this looks either very inefficient, or
> > wrong ;-)
>
> After looking at our monitoring history, it seems the mds cache is
> actually used more fully, but most of our servers are getting a weekly
> reboot by default. This clears the mds cache obviously. I wonder if
> that's a smart idea for an MDS node...? ;-)

No, it's not. Can you also check that you do not have mds_cache_size
configured, perhaps on the MDS local ceph.conf?

-- 
Patrick Donnelly, Ph.D.
He / Him / His
Principal Software Engineer
Red Hat Sunnyvale, CA
GPG: 19F28A586F808C2402351B93C3301A3E258DD79D
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx