This sounds like there is one or a few clients acquiring too many caps. Have you checked this? Are there any messages about the OOM killer? What config changes for the MDS have you made?
Yes, it's individual clients acquiring too my caps. I first ran the adjusted recall settings you suggested after we had gone through several bugs. Right now I am trying distributed ephemeral pinning with 3 MDS Dan's suggestion of 6x the default values for recall from the MDS documentation thread. So far, it's working quite well.
I'm hopeful your problems will be addressed by: https://tracker.ceph.com/issues/47307
That does indeed sound a bit like it might fix these kind of issues. _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx