Re: CephFS metadata pool size

Frank Schilder <frans@xxxxxx> · Wed, 12 Jun 2024 14:09:32 +0000

Hi, there seem to be replies missing to this list. For example, I can't find any messages that contain information that could lead to this conclusion:

> * pg_num too low (defaults are too low)
> * pg_num not a power of 2
> * pg_num != number of OSDs in the pool
> * balancer not enabled

It is horrible for other users to follow threads or learn from them if part of the communication is private. This thread is not the first occurrence, it seems to become more frequent recently. Could posters please reply to the list instead of individual users?

Thanks for your consideration.
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14

________________________________________
From: Anthony D'Atri <anthony.datri@xxxxxxxxx>
Sent: Wednesday, June 12, 2024 2:53 PM
To: Eugen Block
Cc: Lars Köppel; ceph-users@xxxxxxx
Subject:  Re: CephFS metadata pool size

If you have:

* pg_num too low (defaults are too low)
* pg_num not a power of 2
* pg_num != number of OSDs in the pool
* balancer not enabled

any of those might result in imbalance.

> On Jun 12, 2024, at 07:33, Eugen Block <eblock@xxxxxx> wrote:
>
> I don't have any good explanation at this point. Can you share some more information like:
>
> ceph pg ls-by-pool <cephfs_metadata>
> ceph osd df (for the relevant OSDs)
> ceph df
>
> Thanks,
> Eugen
>
> Zitat von Lars Köppel <lars.koeppel@xxxxxxxxxx>:
>
>> Since my last update the size of the largest OSD increased by 0.4 TiB while
>> the smallest one only increased by 0.1 TiB. How is this possible?
>>
>> Because the metadata pool reported to have only 900MB space left, I stopped
>> the hot-standby MDS. This gave me 8GB back but these filled up in the last
>> 2h.
>> I think I have to zap the next OSD because the filesystem is getting read
>> only...
>>
>> How is it possible that an OSD has over 1 TiB less data on it after a
>> rebuild? And how is it possible to have so different sizes of OSDs?
>>
>>
>> [image: ariadne.ai Logo] Lars Köppel
>> Developer
>> Email: lars.koeppel@xxxxxxxxxx
>> Phone: +49 6221 5993580 <+4962215993580>
>> ariadne.ai (Germany) GmbH
>> Häusserstraße 3, 69115 Heidelberg
>> Amtsgericht Mannheim, HRB 744040
>> Geschäftsführer: Dr. Fabian Svara
>> https://ariadne.ai
>>
>>
>> On Tue, Jun 11, 2024 at 3:47 PM Lars Köppel <lars.koeppel@xxxxxxxxxx> wrote:
>>
>>> Only in warning mode. And there were no PG splits or merges in the last 2
>>> month.
>>>
>>>
>>> [image: ariadne.ai Logo] Lars Köppel
>>> Developer
>>> Email: lars.koeppel@xxxxxxxxxx
>>> Phone: +49 6221 5993580 <+4962215993580>
>>> ariadne.ai (Germany) GmbH
>>> Häusserstraße 3, 69115 Heidelberg
>>> Amtsgericht Mannheim, HRB 744040
>>> Geschäftsführer: Dr. Fabian Svara
>>> https://ariadne.ai
>>>
>>>
>>> On Tue, Jun 11, 2024 at 3:32 PM Eugen Block <eblock@xxxxxx> wrote:
>>>
>>>> I don't think scrubs can cause this. Do you have autoscaler enabled?
>>>>
>>>> Zitat von Lars Köppel <lars.koeppel@xxxxxxxxxx>:
>>>>
>>>> > Hi,
>>>> >
>>>> > thank you for your response.
>>>> >
>>>> > I don't think this thread covers my problem, because the OSDs for the
>>>> > metadata pool fill up at different rates. So I would think this is no
>>>> > direct problem with the journal.
>>>> > Because we had earlier problems with the journal I changed some
>>>> > settings(see below). I already restarted all MDS multiple times but no
>>>> > change here.
>>>> >
>>>> > The health warnings regarding cache pressure resolve normally after a
>>>> > short period of time, when the heavy load on the client ends. Sometimes
>>>> it
>>>> > stays a bit longer because an rsync is running and copying data on the
>>>> > cluster(rsync is not good at releasing the caps).
>>>> >
>>>> > Could it be a problem if scrubs run most of the time in the background?
>>>> Can
>>>> > this block any other tasks or generate new data itself?
>>>> >
>>>> > Best regards,
>>>> > Lars
>>>> >
>>>> >
>>>> > global                      basic     mds_cache_memory_limit
>>>> >     17179869184
>>>> > global                      advanced  mds_max_caps_per_client
>>>> >    16384
>>>> > global                      advanced
>>>> mds_recall_global_max_decay_threshold
>>>> >    262144
>>>> > global                      advanced  mds_recall_max_decay_rate
>>>> >    1.000000
>>>> > global                      advanced  mds_recall_max_decay_threshold
>>>> >     262144
>>>> > mds                         advanced  mds_cache_trim_threshold
>>>> >     131072
>>>> > mds                         advanced  mds_heartbeat_grace
>>>> >    120.000000
>>>> > mds                         advanced  mds_heartbeat_reset_grace
>>>> >    7400
>>>> > mds                         advanced  mds_tick_interval
>>>> >    3.000000
>>>> >
>>>> >
>>>> > [image: ariadne.ai Logo] Lars Köppel
>>>> > Developer
>>>> > Email: lars.koeppel@xxxxxxxxxx
>>>> > Phone: +49 6221 5993580 <+4962215993580>
>>>> > ariadne.ai (Germany) GmbH
>>>> > Häusserstraße 3, 69115 Heidelberg
>>>> > Amtsgericht Mannheim, HRB 744040
>>>> > Geschäftsführer: Dr. Fabian Svara
>>>> > https://ariadne.ai
>>>> >
>>>> >
>>>> > On Tue, Jun 11, 2024 at 2:05 PM Eugen Block <eblock@xxxxxx> wrote:
>>>> >
>>>> >> Hi,
>>>> >>
>>>> >> can you check if this thread [1] applies to your situation? You don't
>>>> >> have multi-active MDS enabled, but maybe it's still some journal
>>>> >> trimming, or maybe misbehaving clients? In your first post there were
>>>> >> health warnings regarding cache pressure and cache size. Are those
>>>> >> resolved?
>>>> >>
>>>> >> [1]
>>>> >>
>>>> >>
>>>> https://lists.ceph.io/hyperkitty/list/ceph-users@xxxxxxx/thread/7U27L27FHHPDYGA6VNNVWGLTXCGP7X23/#VOOV235D4TP5TEOJUWHF4AVXIOTHYQQE
>>>> >>
>>>> >> Zitat von Lars Köppel <lars.koeppel@xxxxxxxxxx>:
>>>> >>
>>>> >> > Hello everyone,
>>>> >> >
>>>> >> > short update to this problem.
>>>> >> > The zapped OSD is rebuilt and it has now 1.9 TiB (the expected size
>>>> >> ~50%).
>>>> >> > The other 2 OSDs are now at 2.8 respectively 3.2 TiB. They jumped up
>>>> and
>>>> >> > down a lot but the higher one has now also reached 'nearfull'
>>>> status. How
>>>> >> > is this possible? What is going on?
>>>> >> >
>>>> >> > Does anyone have a solution how to fix this without zapping the OSD?
>>>> >> >
>>>> >> > Best regards,
>>>> >> > Lars
>>>> >> >
>>>> >> >
>>>> >> > [image: ariadne.ai Logo] Lars Köppel
>>>> >> > Developer
>>>> >> > Email: lars.koeppel@xxxxxxxxxx
>>>> >> > Phone: +49 6221 5993580 <+4962215993580>
>>>> >> > ariadne.ai (Germany) GmbH
>>>> >> > Häusserstraße 3, 69115 Heidelberg
>>>> >> > Amtsgericht Mannheim, HRB 744040
>>>> >> > Geschäftsführer: Dr. Fabian Svara
>>>> >> > https://ariadne.ai
>>>> >> > _______________________________________________
>>>> >> > ceph-users mailing list -- ceph-users@xxxxxxx
>>>> >> > To unsubscribe send an email to ceph-users-leave@xxxxxxx
>>>> >>
>>>> >>
>>>> >> _______________________________________________
>>>> >> ceph-users mailing list -- ceph-users@xxxxxxx
>>>> >> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>>>> >>
>>>>
>>>>
>>>>
>>>>
>
>
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx