On Thu, Nov 18, 2021 at 12:36 AM 胡 玮文 <huww98@xxxxxxxxxxx> wrote: > > Hi all, > > We are consistently seeing the MDS_CLIENT_RECALL warning in our cluster, it seems harmless, but we cannot get HEALTH_OK, which is annoying. > > The clients that are reported failing to respond to cache pressure are constantly changing, and most of the time we got 1-5 such clients out of ~20. All of the clients are kernel clients, running HWE kernel 5.11 of Ubuntu 20.04. The load is pretty low. > > We are reading datasets that consist of millions of small files from cephfs, so we have tuned some config for performance. Some configs from "ceph config dump" that might be relevant: > > WHO LEVEL OPTION VALUE > mds basic mds_cache_memory_limit 51539607552 > mds advanced mds_max_caps_per_client 8388608 This is pretty high. It may or may not cause problems in the future for you. > client basic client_cache_size 32768 Won't affect kernel clients. > We also manually pinned almost every directory to either rank 0 or rank 1. > > Any thoughts about what causes the warning, or how can we get rid of it? This reminds me of https://tracker.ceph.com/issues/46830 Suggest monitoring the client session information from the MDS as Dan suggested. You can also try increasing mds_min_caps_working_set to see if that helps. -- Patrick Donnelly, Ph.D. He / Him / His Principal Software Engineer Red Hat, Inc. GPG: 19F28A586F808C2402351B93C3301A3E258DD79D _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx