Re: failing to respond to cache pressure

Eugen Block <eblock@xxxxxx> · Thu, 23 Aug 2018 10:00:55 +0000

Hi,

I think it does have positive effect on the messages. Cause I get fewer
messages than before.

that's nice. I also receive definitely less cache pressure messages  
than before.
I also started to play around with the client side cache  
configuration. I halved the client object cache size from 200 MB to  
100 MB:

ceph@host1:~ $ ceph daemon mds.host1 config set client_oc_size 104857600

Although I still encountered one pressure message recently the total  
amount of these messages has decreased significantly.

Regards,
Eugen

Zitat von Zhenshi Zhou <deaderzzs@xxxxxxxxx>:

Hi Eugen,
I think it does have positive effect on the messages. Cause I get fewer
messages than before.

Eugen Block <eblock@xxxxxx> 于2018年8月20日周一 下午9:29写道：

Update: we are getting these messages again.

So the search continues...

Zitat von Eugen Block <eblock@xxxxxx>:

> Hi,
>
> Depending on your kernel (memory leaks with CephFS) increasing the
> mds_cache_memory_limit could be of help. What is your current
> setting now?
>
> ceph:~ # ceph daemon mds.<MDS> config show | grep mds_cache_memory_limit
>
> We had these messages for months, almost every day.
> It would occur when hourly backup jobs ran and the MDS had to serve
> an additional client (searching the whole CephFS for changes)
> besides the existing CephFS clients. First we updated all clients to
> a more recent kernel version, but the warnings didn't stop. Then we
> doubled the cache size from 2 GB to 4 GB last week and since then I
> haven't seen this warning again (for now).
>
> Try playing with the cache size to find a setting fitting your
> needs, but don't forget to monitor your MDS in case something goes
> wrong.
>
> Regards,
> Eugen
>
>
> Zitat von Wido den Hollander <wido@xxxxxxxx>:
>
>> On 08/13/2018 01:22 PM, Zhenshi Zhou wrote:
>>> Hi,
>>> Recently, the cluster runs healthy, but I get warning messages
everyday:
>>>
>>
>> Which version of Ceph? Which version of clients?
>>
>> Can you post:
>>
>> $ ceph versions
>> $ ceph features
>> $ ceph fs status
>>
>> Wido
>>
>>> 2018-08-13 17:39:23.682213 [INF]  Cluster is now healthy
>>> 2018-08-13 17:39:23.682144 [INF]  Health check cleared:
>>> MDS_CLIENT_RECALL (was: 6 clients failing to respond to cache pressure)
>>> 2018-08-13 17:39:23.052022 [INF]  MDS health message cleared (mds.0):
>>> Client docker38:docker failing to respond to cache pressure
>>> 2018-08-13 17:39:23.051979 [INF]  MDS health message cleared (mds.0):
>>> Client docker73:docker failing to respond to cache pressure
>>> 2018-08-13 17:39:23.051934 [INF]  MDS health message cleared (mds.0):
>>> Client docker74:docker failing to respond to cache pressure
>>> 2018-08-13 17:39:23.051853 [INF]  MDS health message cleared (mds.0):
>>> Client docker75:docker failing to respond to cache pressure
>>> 2018-08-13 17:39:23.051815 [INF]  MDS health message cleared (mds.0):
>>> Client docker27:docker failing to respond to cache pressure
>>> 2018-08-13 17:39:23.051753 [INF]  MDS health message cleared (mds.0):
>>> Client docker27 failing to respond to cache pressure
>>> 2018-08-13 17:38:11.100331 [WRN]  Health check update: 6 clients
failing
>>> to respond to cache pressure (MDS_CLIENT_RECALL)
>>> 2018-08-13 17:37:39.570014 [WRN]  Health check update: 5 clients
failing
>>> to respond to cache pressure (MDS_CLIENT_RECALL)
>>> 2018-08-13 17:37:31.099418 [WRN]  Health check update: 3 clients
failing
>>> to respond to cache pressure (MDS_CLIENT_RECALL)
>>> 2018-08-13 17:36:34.564345 [WRN]  Health check update: 1 clients
failing
>>> to respond to cache pressure (MDS_CLIENT_RECALL)
>>> 2018-08-13 17:36:27.121891 [WRN]  Health check update: 3 clients
failing
>>> to respond to cache pressure (MDS_CLIENT_RECALL)
>>> 2018-08-13 17:36:11.967531 [WRN]  Health check update: 5 clients
failing
>>> to respond to cache pressure (MDS_CLIENT_RECALL)
>>> 2018-08-13 17:35:59.870055 [WRN]  Health check update: 6 clients
failing
>>> to respond to cache pressure (MDS_CLIENT_RECALL)
>>> 2018-08-13 17:35:47.787323 [WRN]  Health check update: 3 clients
failing
>>> to respond to cache pressure (MDS_CLIENT_RECALL)
>>> 2018-08-13 17:34:59.435933 [WRN]  Health check failed: 1 clients
failing
>>> to respond to cache pressure (MDS_CLIENT_RECALL)
>>> 2018-08-13 17:34:59.045510 [WRN]  MDS health message (mds.0): Client
>>> docker75:docker failing to respond to cache pressure
>>>
>>> How can I fix it?
>>>
>>>
>>> _______________________________________________
>>> ceph-users mailing list
>>> ceph-users@xxxxxxxxxxxxxx
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@xxxxxxxxxxxxxx
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com