Re: MDS cache tunning

Andres Rojas Guerrero <a.rojas@xxxxxxx> · Wed, 26 May 2021 14:44:08 +0200

Ok thank's, I will try to update Nautilus. But really  I don't 
understand the problem, apparently randomly Warnings appear:

[WRN] Health check failed: 1 MDSs report slow requests (MDS_SLOW_REQUEST)

cluster [INF] Health check cleared: FS_DEGRADED (was: 1 filesystem is 
degraded)

: cluster [DBG] mds.? 
[v2:10.100.190.39:6800/2624951349,v1:10.100.190.39:6801/2624951349] 
up:rejoin
2021-05-26 10:55:33.215102 mon.ceph2mon01 (mon.0) 700 : cluster [DBG] 
fsmap nxtclfs:2/2 {0=ceph2mon03=up:rejoin,1=ceph2mon01=up:active} 1 
up:standby

Degrading the filesystem and I have assumed that the problem is due to 
the memory consumption of the MDS process, which can reach around 80% or 
more of the total memory.

El 26/5/21 a las 13:21, Dan van der Ster escribió:
I've seen your other thread. Using 78GB of RAM when the memory limit
is set to 64GB is not highly unusual, and doesn't necessarily indicate
any problem.
It *would* be a problem if the MDS memory grows uncontrollably, however.

Otherwise, check those new defaults for caps recall -- they were
released around 14.2.19 IIRC.

-- Dan

On Wed, May 26, 2021 at 12:46 PM Andres Rojas Guerrero <a.rojas@xxxxxxx> wrote:

Thanks for the answer. Yes, during these last weeks I have had memory
consumption problems in the MDS nodes that led, at least it seemed to
me, to performance problems in CephFS. I have been varying, for example:

mds_cache_memory_limit
mds_min_caps_per_client
mds_health_cache_threshold
mds_max_caps_per_client
mds_cache_reservation

But without much knowledge and with a trial and error procedure, i.e.
observing how CephFS behaved when changing one of the parameters.
Although I have achieved improvement the procedure does not convince me
at all and that's why I was asking if there was something  more reliable ...

El 26/5/21 a las 12:15, Dan van der Ster escribió:
Hi,

The mds_cache_memory_limit should be set to something relative to the
RAM size of the MDS -- maybe 50% is a good rule of thumb, because
there are a few cases where the RSS can exceed this limit. Your
experience will help guide what size you need (metadata pool IO
activity will be really high if the MDS cache is too small)

Otherwise, in recent releases of N/O/P the defaults for those settings
you mentioned are quite good [1]; I would be surprised if they need
further tuning for 99% of users.
Is there any reason you want to start adjusting these params?

Best Regards,

Dan

[1] https://github.com/ceph/ceph/pull/38574

On Wed, May 26, 2021 at 11:58 AM Andres Rojas Guerrero <a.rojas@xxxxxxx> wrote:

Hi all, I have observed that the MDS Cache Configuration has 18 parameters:

mds_cache_memory_limit
mds_cache_reservation
mds_health_cache_threshold
mds_cache_trim_threshold
mds_cache_trim_decay_rate
mds_recall_max_caps
mds_recall_max_decay_threshold
mds_recall_max_decay_rate
mds_recall_global_max_decay_threshold
mds_recall_warning_threshold
mds_recall_warning_decay_rate
mds_session_cap_acquisition_throttle
mds_session_cap_acquisition_decay_rate
mds_session_max_caps_throttle_ratio
mds_cap_acquisition_throttle_retry_request_timeout
mds_session_cache_liveness_magnitude
mds_session_cache_liveness_decay_rate
mds_max_caps_per_client

I find the Ceph documentation in this section a bit cryptic and I have
tried to find some resources that talk about how to tune these
parameters, but without success.

Does anyone have experience in adjusting these parameters according to
the characteristics of the Ceph cluster itself, the hardware and the use
of MDS?

Regards!
--
*******************************************************
Andrés Rojas Guerrero
Unidad Sistemas Linux
Area Arquitectura Tecnológica
Secretaría General Adjunta de Informática
Consejo Superior de Investigaciones Científicas (CSIC)
Pinar 19
28006 - Madrid
Tel: +34 915680059 -- Ext. 990059
email: a.rojas@xxxxxxx
ID comunicate.csic.es: @50852720l:matrix.csic.es
*******************************************************
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

--
*******************************************************
Andrés Rojas Guerrero
Unidad Sistemas Linux
Area Arquitectura Tecnológica
Secretaría General Adjunta de Informática
Consejo Superior de Investigaciones Científicas (CSIC)
Pinar 19
28006 - Madrid
Tel: +34 915680059 -- Ext. 990059
email: a.rojas@xxxxxxx
ID comunicate.csic.es: @50852720l:matrix.csic.es
*******************************************************

--
*******************************************************
Andrés Rojas Guerrero
Unidad Sistemas Linux
Area Arquitectura Tecnológica
Secretaría General Adjunta de Informática
Consejo Superior de Investigaciones Científicas (CSIC)
Pinar 19
28006 - Madrid
Tel: +34 915680059 -- Ext. 990059
email: a.rojas@xxxxxxx
ID comunicate.csic.es: @50852720l:matrix.csic.es
*******************************************************
Attachment:
OpenPGP_0x2DEE9321B16B4A68.asc

Description: OpenPGP public key
Attachment:
OpenPGP_signature

Description: OpenPGP digital signature
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx