Re: failing to respond to cache pressure

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

currently our ceph servers use 4.4.104, our clients mostly have newer versions, something like 4.4.126.

I set mds_cache_memory_limit from 1G to 2G, and then to 4G. I still get the
warning
messages, and the messages would disappear in 1 or 2 minutes.

Did at least the number of clients decrease with the changes? Have you verified that the new configured value was actually applied? Just to make sure: the command "ceph daemon mds.<MDS> config set ..." does not inject the value at runtime, you would have to run "ceph tell mds.<MDS> config set ..." to change it at runtime. Or you do it in the [mds] config section and restart the MDS daemons.

Regards


Zitat von Zhenshi Zhou <deaderzzs@xxxxxxxxx>:

Hi, Eugen,
I set mds_cache_memory_limit from 1G to 2G, and then to 4G. I still get the
warning
messages, and the messages would disappear in 1 or 2 minutes.
Which version do your kernels use?

Zhenshi Zhou <deaderzzs@xxxxxxxxx> 于2018年8月13日周一 下午10:15写道:

Hi Eugen,
The command shows "mds_cache_memory_limit": "1073741824".
And I'll increase the cache size for a try.

Thanks

Eugen Block <eblock@xxxxxx> 于2018年8月13日周一 下午9:48写道:

Hi,

Depending on your kernel (memory leaks with CephFS) increasing the
mds_cache_memory_limit could be of help. What is your current setting
now?

ceph:~ # ceph daemon mds.<MDS> config show | grep mds_cache_memory_limit

We had these messages for months, almost every day.
It would occur when hourly backup jobs ran and the MDS had to serve an
additional client (searching the whole CephFS for changes) besides the
existing CephFS clients. First we updated all clients to a more recent
kernel version, but the warnings didn't stop. Then we doubled the
cache size from 2 GB to 4 GB last week and since then I haven't seen
this warning again (for now).

Try playing with the cache size to find a setting fitting your needs,
but don't forget to monitor your MDS in case something goes wrong.

Regards,
Eugen


Zitat von Wido den Hollander <wido@xxxxxxxx>:

> On 08/13/2018 01:22 PM, Zhenshi Zhou wrote:
>> Hi,
>> Recently, the cluster runs healthy, but I get warning messages
everyday:
>>
>
> Which version of Ceph? Which version of clients?
>
> Can you post:
>
> $ ceph versions
> $ ceph features
> $ ceph fs status
>
> Wido
>
>> 2018-08-13 17:39:23.682213 [INF]  Cluster is now healthy
>> 2018-08-13 17:39:23.682144 [INF]  Health check cleared:
>> MDS_CLIENT_RECALL (was: 6 clients failing to respond to cache pressure)
>> 2018-08-13 17:39:23.052022 [INF]  MDS health message cleared (mds.0):
>> Client docker38:docker failing to respond to cache pressure
>> 2018-08-13 17:39:23.051979 [INF]  MDS health message cleared (mds.0):
>> Client docker73:docker failing to respond to cache pressure
>> 2018-08-13 17:39:23.051934 [INF]  MDS health message cleared (mds.0):
>> Client docker74:docker failing to respond to cache pressure
>> 2018-08-13 17:39:23.051853 [INF]  MDS health message cleared (mds.0):
>> Client docker75:docker failing to respond to cache pressure
>> 2018-08-13 17:39:23.051815 [INF]  MDS health message cleared (mds.0):
>> Client docker27:docker failing to respond to cache pressure
>> 2018-08-13 17:39:23.051753 [INF]  MDS health message cleared (mds.0):
>> Client docker27 failing to respond to cache pressure
>> 2018-08-13 17:38:11.100331 [WRN]  Health check update: 6 clients
failing
>> to respond to cache pressure (MDS_CLIENT_RECALL)
>> 2018-08-13 17:37:39.570014 [WRN]  Health check update: 5 clients
failing
>> to respond to cache pressure (MDS_CLIENT_RECALL)
>> 2018-08-13 17:37:31.099418 [WRN]  Health check update: 3 clients
failing
>> to respond to cache pressure (MDS_CLIENT_RECALL)
>> 2018-08-13 17:36:34.564345 [WRN]  Health check update: 1 clients
failing
>> to respond to cache pressure (MDS_CLIENT_RECALL)
>> 2018-08-13 17:36:27.121891 [WRN]  Health check update: 3 clients
failing
>> to respond to cache pressure (MDS_CLIENT_RECALL)
>> 2018-08-13 17:36:11.967531 [WRN]  Health check update: 5 clients
failing
>> to respond to cache pressure (MDS_CLIENT_RECALL)
>> 2018-08-13 17:35:59.870055 [WRN]  Health check update: 6 clients
failing
>> to respond to cache pressure (MDS_CLIENT_RECALL)
>> 2018-08-13 17:35:47.787323 [WRN]  Health check update: 3 clients
failing
>> to respond to cache pressure (MDS_CLIENT_RECALL)
>> 2018-08-13 17:34:59.435933 [WRN]  Health check failed: 1 clients
failing
>> to respond to cache pressure (MDS_CLIENT_RECALL)
>> 2018-08-13 17:34:59.045510 [WRN]  MDS health message (mds.0): Client
>> docker75:docker failing to respond to cache pressure
>>
>> How can I fix it?
>>
>>
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@xxxxxxxxxxxxxx
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com





_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux