Hi Patrick, One of the stuck client has num_caps at around 269700, and well above the number of files opened on the client (about 9k). See my reply to Dan for details. So I don't think this warning is simply caused by "mds_min_caps_working_set" being set too low. > -----邮件原件----- > 发件人: Patrick Donnelly <pdonnell@xxxxxxxxxx> > 发送时间: 2021年11月19日 9:37 > 收件人: 胡 玮文 <huww98@xxxxxxxxxxx> > 抄送: ceph-users@xxxxxxx > 主题: Re: Annoying MDS_CLIENT_RECALL Warning > > On Thu, Nov 18, 2021 at 12:36 AM 胡 玮文 <huww98@xxxxxxxxxxx> wrote: > > > > Hi all, > > > > We are consistently seeing the MDS_CLIENT_RECALL warning in our cluster, it > seems harmless, but we cannot get HEALTH_OK, which is annoying. > > > > The clients that are reported failing to respond to cache pressure are > constantly changing, and most of the time we got 1-5 such clients out of ~20. All > of the clients are kernel clients, running HWE kernel 5.11 of Ubuntu 20.04. The > load is pretty low. > > > > We are reading datasets that consist of millions of small files from cephfs, so > we have tuned some config for performance. Some configs from "ceph config > dump" that might be relevant: > > > > WHO LEVEL OPTION VALUE > > mds basic mds_cache_memory_limit 51539607552 > > mds advanced mds_max_caps_per_client 8388608 > > This is pretty high. It may or may not cause problems in the future for you. We sometimes need to iterate over datasets containing several millions of files. And we have 512G memory on client. So we set this to very high value to fully utilize our memory as page cache to accelerate IO. > > > client basic client_cache_size 32768 > > Won't affect kernel clients. > > > We also manually pinned almost every directory to either rank 0 or rank 1. > > > > Any thoughts about what causes the warning, or how can we get rid of it? > > This reminds me of https://tracker.ceph.com/issues/46830 > > Suggest monitoring the client session information from the MDS as Dan > suggested. You can also try increasing mds_min_caps_working_set to see if that > helps. > > > > -- > Patrick Donnelly, Ph.D. > He / Him / His > Principal Software Engineer > Red Hat, Inc. > GPG: 19F28A586F808C2402351B93C3301A3E258DD79D _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx