On 24/02/2021 12:40, Simon Oosthoek wrote: > Hi > > we've been running our Ceph cluster for nearly 2 years now (Nautilus) > and recently, due to a temporary situation the cluster is at 80% full. > > We are only using CephFS on the cluster. > > Normally, I realize we should be adding OSD nodes, but this is a > temporary situation, and I expect the cluster to go to <60% full quite soon. > > Anyway, we are noticing some really problematic slowdowns. There are > some things that could be related but we are unsure... > > - Our 2 MDS nodes (1 active, 1 standby) are configured with 128GB RAM, > but are not using more than 2GB, this looks either very inefficient, or > wrong ;-) After looking at our monitoring history, it seems the mds cache is actually used more fully, but most of our servers are getting a weekly reboot by default. This clears the mds cache obviously. I wonder if that's a smart idea for an MDS node...? ;-) > > "ceph config dump |grep mds": > mds basic mds_cache_memory_limit > 107374182400 > mds advanced mds_max_scrub_ops_in_progress 10 > > Perhaps we require more or different settings to properly use the MDS > memory? > > - On all our OSD nodes, the memory line is red in "atop", though no swap > is in use, it seems the memory on the OSD nodes is taking quite a > beating, is this normal, or can we tweak settings to make it less stressed? > > This is the first time we are having performance issues like this, I > think, I'd like to learn some commands to help me analyse this... > > I hope this will ring a bell with someone... > > Cheers > > /Simon > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx > _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx