On Tue, 2022-08-16 at 07:50 +0000, Eugen Block wrote: > Hi, > > > However, the ceph-mds process is pretty much constantly over 100% > > CPU > > and often over 200%. Given it's a single process, right? It makes > > me > > think that some operations are too slow or some task is pegging the > > CPU > > at 100%. > > you might want look into multi-active MDS, especially with 5000 > clients. We encountered the same thing in a cluster with only around > 20 clients (kernel client mount) with lots of many small files. The > single active MDS at that time was not heavily loaded, mds cache > size > was also not the problem but the performance was not good at all. Thanks, yeah I agree that multi-active MDS is probably the way to scale, but I'm not sure how well that would work on Luminous. I guess it might be worth a shot and if it doesn't behave well, just turn it off... I'll think about this some more, thanks. Perhaps the first step is to upgrade the cluster to a more recent version where multi-MDS will be more stable, but that's a whole separate issue. And that does bring me back to trying to identify "bad" clients and ask them to change the way their jobs work, to help relieve the pressure for everyone else until a longer-term solution can be applied. > So > we decided to increase the number of MDS processes per MDS server > and > also started to use directory pinning which increased the > performance > significantly. > Is "increasing the number of MDS processes per MDS server" only a setting that's available in multi-MDS mode (I'm guessing so)? It'd be kinda cool if there was a way to do that on the one MDS, then at least I could use some of the other dozen or so CPU cores on the machine... Thank you for taking the time to respond, much appreciated. Cheers, -c _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx