Re: MDS cache tunning

Dan van der Ster <dan@xxxxxxxxxxxxxx> · Wed, 26 May 2021 16:51:41 +0200

FS_DEGRADED indicates that your MDS restarted or stopped responding to
health beacons.
Are your MDSs going OOM?

I see you have two active MDSs. Is your cluster more stable if you use
only one single active MDS?

-- Dan

On Wed, May 26, 2021 at 2:44 PM Andres Rojas Guerrero <a.rojas@xxxxxxx> wrote:
>
> Ok thank's, I will try to update Nautilus. But really  I don't
> understand the problem, apparently randomly Warnings appear:
>
> [WRN] Health check failed: 1 MDSs report slow requests (MDS_SLOW_REQUEST)
>
> cluster [INF] Health check cleared: FS_DEGRADED (was: 1 filesystem is
> degraded)
>
> : cluster [DBG] mds.?
> [v2:10.100.190.39:6800/2624951349,v1:10.100.190.39:6801/2624951349]
> up:rejoin
> 2021-05-26 10:55:33.215102 mon.ceph2mon01 (mon.0) 700 : cluster [DBG]
> fsmap nxtclfs:2/2 {0=ceph2mon03=up:rejoin,1=ceph2mon01=up:active} 1
> up:standby
>
> Degrading the filesystem and I have assumed that the problem is due to
> the memory consumption of the MDS process, which can reach around 80% or
> more of the total memory.
>
>
>
>
>
> El 26/5/21 a las 13:21, Dan van der Ster escribió:
> > I've seen your other thread. Using 78GB of RAM when the memory limit
> > is set to 64GB is not highly unusual, and doesn't necessarily indicate
> > any problem.
> > It *would* be a problem if the MDS memory grows uncontrollably, however.
> >
> > Otherwise, check those new defaults for caps recall -- they were
> > released around 14.2.19 IIRC.
> >
> > -- Dan
> >
> > On Wed, May 26, 2021 at 12:46 PM Andres Rojas Guerrero <a.rojas@xxxxxxx> wrote:
> >>
> >> Thanks for the answer. Yes, during these last weeks I have had memory
> >> consumption problems in the MDS nodes that led, at least it seemed to
> >> me, to performance problems in CephFS. I have been varying, for example:
> >>
> >> mds_cache_memory_limit
> >> mds_min_caps_per_client
> >> mds_health_cache_threshold
> >> mds_max_caps_per_client
> >> mds_cache_reservation
> >>
> >> But without much knowledge and with a trial and error procedure, i.e.
> >> observing how CephFS behaved when changing one of the parameters.
> >> Although I have achieved improvement the procedure does not convince me
> >> at all and that's why I was asking if there was something  more reliable ...
> >>
> >>
> >>
> >>
> >> El 26/5/21 a las 12:15, Dan van der Ster escribió:
> >>> Hi,
> >>>
> >>> The mds_cache_memory_limit should be set to something relative to the
> >>> RAM size of the MDS -- maybe 50% is a good rule of thumb, because
> >>> there are a few cases where the RSS can exceed this limit. Your
> >>> experience will help guide what size you need (metadata pool IO
> >>> activity will be really high if the MDS cache is too small)
> >>>
> >>> Otherwise, in recent releases of N/O/P the defaults for those settings
> >>> you mentioned are quite good [1]; I would be surprised if they need
> >>> further tuning for 99% of users.
> >>> Is there any reason you want to start adjusting these params?
> >>>
> >>> Best Regards,
> >>>
> >>> Dan
> >>>
> >>> [1] https://github.com/ceph/ceph/pull/38574
> >>>
> >>> On Wed, May 26, 2021 at 11:58 AM Andres Rojas Guerrero <a.rojas@xxxxxxx> wrote:
> >>>>
> >>>> Hi all, I have observed that the MDS Cache Configuration has 18 parameters:
> >>>>
> >>>> mds_cache_memory_limit
> >>>> mds_cache_reservation
> >>>> mds_health_cache_threshold
> >>>> mds_cache_trim_threshold
> >>>> mds_cache_trim_decay_rate
> >>>> mds_recall_max_caps
> >>>> mds_recall_max_decay_threshold
> >>>> mds_recall_max_decay_rate
> >>>> mds_recall_global_max_decay_threshold
> >>>> mds_recall_warning_threshold
> >>>> mds_recall_warning_decay_rate
> >>>> mds_session_cap_acquisition_throttle
> >>>> mds_session_cap_acquisition_decay_rate
> >>>> mds_session_max_caps_throttle_ratio
> >>>> mds_cap_acquisition_throttle_retry_request_timeout
> >>>> mds_session_cache_liveness_magnitude
> >>>> mds_session_cache_liveness_decay_rate
> >>>> mds_max_caps_per_client
> >>>>
> >>>>
> >>>>
> >>>> I find the Ceph documentation in this section a bit cryptic and I have
> >>>> tried to find some resources that talk about how to tune these
> >>>> parameters, but without success.
> >>>>
> >>>> Does anyone have experience in adjusting these parameters according to
> >>>> the characteristics of the Ceph cluster itself, the hardware and the use
> >>>> of MDS?
> >>>>
> >>>> Regards!
> >>>> --
> >>>> *******************************************************
> >>>> Andrés Rojas Guerrero
> >>>> Unidad Sistemas Linux
> >>>> Area Arquitectura Tecnológica
> >>>> Secretaría General Adjunta de Informática
> >>>> Consejo Superior de Investigaciones Científicas (CSIC)
> >>>> Pinar 19
> >>>> 28006 - Madrid
> >>>> Tel: +34 915680059 -- Ext. 990059
> >>>> email: a.rojas@xxxxxxx
> >>>> ID comunicate.csic.es: @50852720l:matrix.csic.es
> >>>> *******************************************************
> >>>> _______________________________________________
> >>>> ceph-users mailing list -- ceph-users@xxxxxxx
> >>>> To unsubscribe send an email to ceph-users-leave@xxxxxxx
> >>
> >> --
> >> *******************************************************
> >> Andrés Rojas Guerrero
> >> Unidad Sistemas Linux
> >> Area Arquitectura Tecnológica
> >> Secretaría General Adjunta de Informática
> >> Consejo Superior de Investigaciones Científicas (CSIC)
> >> Pinar 19
> >> 28006 - Madrid
> >> Tel: +34 915680059 -- Ext. 990059
> >> email: a.rojas@xxxxxxx
> >> ID comunicate.csic.es: @50852720l:matrix.csic.es
> >> *******************************************************
>
> --
> *******************************************************
> Andrés Rojas Guerrero
> Unidad Sistemas Linux
> Area Arquitectura Tecnológica
> Secretaría General Adjunta de Informática
> Consejo Superior de Investigaciones Científicas (CSIC)
> Pinar 19
> 28006 - Madrid
> Tel: +34 915680059 -- Ext. 990059
> email: a.rojas@xxxxxxx
> ID comunicate.csic.es: @50852720l:matrix.csic.es
> *******************************************************
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx