Re: CephFS metadata pool size

Eugen Block <eblock@xxxxxx> · Tue, 11 Jun 2024 13:32:53 +0000

I don't think scrubs can cause this. Do you have autoscaler enabled?

Zitat von Lars Köppel <lars.koeppel@xxxxxxxxxx>:

Hi,

thank you for your response.

I don't think this thread covers my problem, because the OSDs for the
metadata pool fill up at different rates. So I would think this is no
direct problem with the journal.
Because we had earlier problems with the journal I changed some
settings(see below). I already restarted all MDS multiple times but no
change here.

The health warnings regarding cache pressure resolve normally after a
short period of time, when the heavy load on the client ends. Sometimes it
stays a bit longer because an rsync is running and copying data on the
cluster(rsync is not good at releasing the caps).

Could it be a problem if scrubs run most of the time in the background? Can
this block any other tasks or generate new data itself?

Best regards,
Lars

global                      basic     mds_cache_memory_limit
    17179869184
global                      advanced  mds_max_caps_per_client
   16384
global                      advanced  mds_recall_global_max_decay_threshold
   262144
global                      advanced  mds_recall_max_decay_rate
   1.000000
global                      advanced  mds_recall_max_decay_threshold
    262144
mds                         advanced  mds_cache_trim_threshold
    131072
mds                         advanced  mds_heartbeat_grace
   120.000000
mds                         advanced  mds_heartbeat_reset_grace
   7400
mds                         advanced  mds_tick_interval
   3.000000

[image: ariadne.ai Logo] Lars Köppel
Developer
Email: lars.koeppel@xxxxxxxxxx
Phone: +49 6221 5993580 <+4962215993580>
ariadne.ai (Germany) GmbH
Häusserstraße 3, 69115 Heidelberg
Amtsgericht Mannheim, HRB 744040
Geschäftsführer: Dr. Fabian Svara
https://ariadne.ai

On Tue, Jun 11, 2024 at 2:05 PM Eugen Block <eblock@xxxxxx> wrote:

Hi,

can you check if this thread [1] applies to your situation? You don't
have multi-active MDS enabled, but maybe it's still some journal
trimming, or maybe misbehaving clients? In your first post there were
health warnings regarding cache pressure and cache size. Are those
resolved?

[1]

https://lists.ceph.io/hyperkitty/list/ceph-users@xxxxxxx/thread/7U27L27FHHPDYGA6VNNVWGLTXCGP7X23/#VOOV235D4TP5TEOJUWHF4AVXIOTHYQQE

Zitat von Lars Köppel <lars.koeppel@xxxxxxxxxx>:

> Hello everyone,
>
> short update to this problem.
> The zapped OSD is rebuilt and it has now 1.9 TiB (the expected size
~50%).
> The other 2 OSDs are now at 2.8 respectively 3.2 TiB. They jumped up and
> down a lot but the higher one has now also reached 'nearfull' status. How
> is this possible? What is going on?
>
> Does anyone have a solution how to fix this without zapping the OSD?
>
> Best regards,
> Lars
>
>
> [image: ariadne.ai Logo] Lars Köppel
> Developer
> Email: lars.koeppel@xxxxxxxxxx
> Phone: +49 6221 5993580 <+4962215993580>
> ariadne.ai (Germany) GmbH
> Häusserstraße 3, 69115 Heidelberg
> Amtsgericht Mannheim, HRB 744040
> Geschäftsführer: Dr. Fabian Svara
> https://ariadne.ai
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx