On 24/02/2021 22:28, Patrick Donnelly wrote: > Hello Simon, > > On Wed, Feb 24, 2021 at 7:43 AM Simon Oosthoek <s.oosthoek@xxxxxxxxxxxxx> wrote: >> >> On 24/02/2021 12:40, Simon Oosthoek wrote: >>> Hi >>> >>> we've been running our Ceph cluster for nearly 2 years now (Nautilus) >>> and recently, due to a temporary situation the cluster is at 80% full. >>> >>> We are only using CephFS on the cluster. >>> >>> Normally, I realize we should be adding OSD nodes, but this is a >>> temporary situation, and I expect the cluster to go to <60% full quite soon. >>> >>> Anyway, we are noticing some really problematic slowdowns. There are >>> some things that could be related but we are unsure... >>> >>> - Our 2 MDS nodes (1 active, 1 standby) are configured with 128GB RAM, >>> but are not using more than 2GB, this looks either very inefficient, or >>> wrong ;-) >> >> After looking at our monitoring history, it seems the mds cache is >> actually used more fully, but most of our servers are getting a weekly >> reboot by default. This clears the mds cache obviously. I wonder if >> that's a smart idea for an MDS node...? ;-) > > No, it's not. Can you also check that you do not have mds_cache_size > configured, perhaps on the MDS local ceph.conf? > Hi Patrick, I've already changed the reboot period to 1 month. The mds_cache_size is not configured locally in the /etc/ceph/ceph.conf file, so I guess it's just the weekly reboot that cleared the memory of cache data... I'm starting to think that a full ceph cluster could probably be the only cause of performance problems. Though I don't know why that would be. Cheers /Simon _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx