Long story short, we've got a lot of empty directories that I'm working on removing. While removing directories, using "perf top -g" we can watch the MDS daemon go to 100% cpu usage with "SnapRealm:: split_at" and "CInode::is_ancestor_of". It's this 2 year old bug that still is around. https://tracker.ceph.com/issues/53192 To help combat this, we've moved our snapshot schedule down the tree one level so the snaprealm is significantly smaller. Our luck with multiple active MDSs hasn't been great so we are still on a single MDS. To help split the load, I'm working on moving different workloads to different filesytems within ceph. A user can still fairly easily overwhelm the MDS's finisher thread and basically stop all cephfs io through that MDS. I'm hoping we can get some other people chiming in with "Me Too!" so there can be some traction behind fixing this. It's a longstanding bug so the version is less important, but we are on 17.2.7. Thoughts? -paul -- Paul Mezzanini Platform Engineer III Research Computing Rochester Institute of Technology “End users is a description, not a goal.” _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx