Terrible cephfs rmdir performance

Paul Mezzanini <pfmeec@xxxxxxx> · Wed, 13 Dec 2023 16:19:45 +0000

Long story short, we've got a lot of empty directories that I'm working on removing.  While removing directories, using "perf top -g" we can watch the MDS daemon go to 100% cpu usage with "SnapRealm:: split_at" and "CInode::is_ancestor_of".

It's this 2 year old bug that still is around.
https://tracker.ceph.com/issues/53192

To help combat this, we've moved our snapshot schedule down the tree one level so the snaprealm is significantly smaller.  Our luck with multiple active MDSs hasn't been great so we are still on a single MDS.  To help split the load, I'm working on moving different workloads to different filesytems within ceph.

A user can still fairly easily overwhelm the MDS's finisher thread and basically stop all cephfs io through that MDS.     I'm hoping we can get some other people chiming in with "Me Too!" so there can be some traction behind fixing this.  

It's a longstanding bug so the version is less important, but we are on 17.2.7.

Thoughts?
-paul

--

Paul Mezzanini
Platform Engineer III
Research Computing

Rochester Institute of Technology

 “End users is a description, not a goal.”

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx