Re: Annoying MDS_CLIENT_RECALL Warning

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Patrick,

One of the stuck client has num_caps at around 269700, and well above the number of files opened on the client (about 9k). See my reply to Dan for details. So I don't think this warning is simply caused by "mds_min_caps_working_set" being set too low.

> -----邮件原件-----
> 发件人: Patrick Donnelly <pdonnell@xxxxxxxxxx>
> 发送时间: 2021年11月19日 9:37
> 收件人: 胡 玮文 <huww98@xxxxxxxxxxx>
> 抄送: ceph-users@xxxxxxx
> 主题: Re:  Annoying MDS_CLIENT_RECALL Warning
> 
> On Thu, Nov 18, 2021 at 12:36 AM 胡 玮文 <huww98@xxxxxxxxxxx> wrote:
> >
> > Hi all,
> >
> > We are consistently seeing the MDS_CLIENT_RECALL warning in our cluster, it
> seems harmless, but we cannot get HEALTH_OK, which is annoying.
> >
> > The clients that are reported failing to respond to cache pressure are
> constantly changing, and most of the time we got 1-5 such clients out of ~20. All
> of the clients are kernel clients, running HWE kernel 5.11 of Ubuntu 20.04. The
> load is pretty low.
> >
> > We are reading datasets that consist of millions of small files from cephfs, so
> we have tuned some config for performance. Some configs from "ceph config
> dump" that might be relevant:
> >
> > WHO       LEVEL     OPTION                   VALUE
> >   mds     basic     mds_cache_memory_limit   51539607552
> >   mds     advanced  mds_max_caps_per_client  8388608
> 
> This is pretty high. It may or may not cause problems in the future for you.

We sometimes need to iterate over datasets containing several millions of files. And we have 512G memory on client. So we set this to very high value to fully utilize our memory as page cache to accelerate IO.

> 
> >   client  basic     client_cache_size        32768
> 
> Won't affect kernel clients.
> 
> > We also manually pinned almost every directory to either rank 0 or rank 1.
> >
> > Any thoughts about what causes the warning, or how can we get rid of it?
> 
> This reminds me of https://tracker.ceph.com/issues/46830
> 
> Suggest monitoring the client session information from the MDS as Dan
> suggested. You can also try increasing mds_min_caps_working_set to see if that
> helps.
> 
> 
> 
> --
> Patrick Donnelly, Ph.D.
> He / Him / His
> Principal Software Engineer
> Red Hat, Inc.
> GPG: 19F28A586F808C2402351B93C3301A3E258DD79D

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux