Re: Terrible cephfs rmdir performance

Venky Shankar <vshankar@xxxxxxxxxx> · Tue, 19 Dec 2023 13:16:08 +0530

Hi Paul,

On Wed, Dec 13, 2023 at 9:50 PM Paul Mezzanini <pfmeec@xxxxxxx> wrote:
>
> Long story short, we've got a lot of empty directories that I'm working on removing.  While removing directories, using "perf top -g" we can watch the MDS daemon go to 100% cpu usage with "SnapRealm:: split_at" and "CInode::is_ancestor_of".
>
> It's this 2 year old bug that still is around.
> https://tracker.ceph.com/issues/53192

Unfortunately the fix isn't straightforward as it was attempted, so
lately, we've been working around these issues by pinning
to-be-deleted directories to a (separate) active MDS. This might need
some tuning at the application level to move stuff inside this
"special" pinned directory and then delete it.

HTH.

>
> To help combat this, we've moved our snapshot schedule down the tree one level so the snaprealm is significantly smaller.  Our luck with multiple active MDSs hasn't been great so we are still on a single MDS.  To help split the load, I'm working on moving different workloads to different filesytems within ceph.
>
> A user can still fairly easily overwhelm the MDS's finisher thread and basically stop all cephfs io through that MDS.     I'm hoping we can get some other people chiming in with "Me Too!" so there can be some traction behind fixing this.
>
> It's a longstanding bug so the version is less important, but we are on 17.2.7.
>
> Thoughts?
> -paul
>
> --
>
> Paul Mezzanini
> Platform Engineer III
> Research Computing
>
> Rochester Institute of Technology
>
>  “End users is a description, not a goal.”
>
>
>
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>

-- 
Cheers,
Venky
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx