Re: Annoying MDS_CLIENT_RECALL Warning

Patrick Donnelly <pdonnell@xxxxxxxxxx> · Thu, 18 Nov 2021 20:36:30 -0500

On Thu, Nov 18, 2021 at 12:36 AM 胡 玮文 <huww98@xxxxxxxxxxx> wrote:
>
> Hi all,
>
> We are consistently seeing the MDS_CLIENT_RECALL warning in our cluster, it seems harmless, but we cannot get HEALTH_OK, which is annoying.
>
> The clients that are reported failing to respond to cache pressure are constantly changing, and most of the time we got 1-5 such clients out of ~20. All of the clients are kernel clients, running HWE kernel 5.11 of Ubuntu 20.04. The load is pretty low.
>
> We are reading datasets that consist of millions of small files from cephfs, so we have tuned some config for performance. Some configs from "ceph config dump" that might be relevant:
>
> WHO       LEVEL     OPTION                   VALUE
>   mds     basic     mds_cache_memory_limit   51539607552
>   mds     advanced  mds_max_caps_per_client  8388608

This is pretty high. It may or may not cause problems in the future for you.

>   client  basic     client_cache_size        32768

Won't affect kernel clients.

> We also manually pinned almost every directory to either rank 0 or rank 1.
>
> Any thoughts about what causes the warning, or how can we get rid of it?

This reminds me of https://tracker.ceph.com/issues/46830

Suggest monitoring the client session information from the MDS as Dan
suggested. You can also try increasing mds_min_caps_working_set to see
if that helps.

-- 
Patrick Donnelly, Ph.D.
He / Him / His
Principal Software Engineer
Red Hat, Inc.
GPG: 19F28A586F808C2402351B93C3301A3E258DD79D

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx