Re: CephFS metadata pool size

"Anthony D'Atri" <anthony.datri@xxxxxxxxx> · Wed, 12 Jun 2024 08:53:01 -0400

If you have:

* pg_num too low (defaults are too low)
* pg_num not a power of 2
* pg_num != number of OSDs in the pool
* balancer not enabled

any of those might result in imbalance.

> On Jun 12, 2024, at 07:33, Eugen Block <eblock@xxxxxx> wrote:
> 
> I don't have any good explanation at this point. Can you share some more information like:
> 
> ceph pg ls-by-pool <cephfs_metadata>
> ceph osd df (for the relevant OSDs)
> ceph df
> 
> Thanks,
> Eugen
> 
> Zitat von Lars Köppel <lars.koeppel@xxxxxxxxxx>:
> 
>> Since my last update the size of the largest OSD increased by 0.4 TiB while
>> the smallest one only increased by 0.1 TiB. How is this possible?
>> 
>> Because the metadata pool reported to have only 900MB space left, I stopped
>> the hot-standby MDS. This gave me 8GB back but these filled up in the last
>> 2h.
>> I think I have to zap the next OSD because the filesystem is getting read
>> only...
>> 
>> How is it possible that an OSD has over 1 TiB less data on it after a
>> rebuild? And how is it possible to have so different sizes of OSDs?
>> 
>> 
>> [image: ariadne.ai Logo] Lars Köppel
>> Developer
>> Email: lars.koeppel@xxxxxxxxxx
>> Phone: +49 6221 5993580 <+4962215993580>
>> ariadne.ai (Germany) GmbH
>> Häusserstraße 3, 69115 Heidelberg
>> Amtsgericht Mannheim, HRB 744040
>> Geschäftsführer: Dr. Fabian Svara
>> https://ariadne.ai
>> 
>> 
>> On Tue, Jun 11, 2024 at 3:47 PM Lars Köppel <lars.koeppel@xxxxxxxxxx> wrote:
>> 
>>> Only in warning mode. And there were no PG splits or merges in the last 2
>>> month.
>>> 
>>> 
>>> [image: ariadne.ai Logo] Lars Köppel
>>> Developer
>>> Email: lars.koeppel@xxxxxxxxxx
>>> Phone: +49 6221 5993580 <+4962215993580>
>>> ariadne.ai (Germany) GmbH
>>> Häusserstraße 3, 69115 Heidelberg
>>> Amtsgericht Mannheim, HRB 744040
>>> Geschäftsführer: Dr. Fabian Svara
>>> https://ariadne.ai
>>> 
>>> 
>>> On Tue, Jun 11, 2024 at 3:32 PM Eugen Block <eblock@xxxxxx> wrote:
>>> 
>>>> I don't think scrubs can cause this. Do you have autoscaler enabled?
>>>> 
>>>> Zitat von Lars Köppel <lars.koeppel@xxxxxxxxxx>:
>>>> 
>>>> > Hi,
>>>> >
>>>> > thank you for your response.
>>>> >
>>>> > I don't think this thread covers my problem, because the OSDs for the
>>>> > metadata pool fill up at different rates. So I would think this is no
>>>> > direct problem with the journal.
>>>> > Because we had earlier problems with the journal I changed some
>>>> > settings(see below). I already restarted all MDS multiple times but no
>>>> > change here.
>>>> >
>>>> > The health warnings regarding cache pressure resolve normally after a
>>>> > short period of time, when the heavy load on the client ends. Sometimes
>>>> it
>>>> > stays a bit longer because an rsync is running and copying data on the
>>>> > cluster(rsync is not good at releasing the caps).
>>>> >
>>>> > Could it be a problem if scrubs run most of the time in the background?
>>>> Can
>>>> > this block any other tasks or generate new data itself?
>>>> >
>>>> > Best regards,
>>>> > Lars
>>>> >
>>>> >
>>>> > global                      basic     mds_cache_memory_limit
>>>> >     17179869184
>>>> > global                      advanced  mds_max_caps_per_client
>>>> >    16384
>>>> > global                      advanced
>>>> mds_recall_global_max_decay_threshold
>>>> >    262144
>>>> > global                      advanced  mds_recall_max_decay_rate
>>>> >    1.000000
>>>> > global                      advanced  mds_recall_max_decay_threshold
>>>> >     262144
>>>> > mds                         advanced  mds_cache_trim_threshold
>>>> >     131072
>>>> > mds                         advanced  mds_heartbeat_grace
>>>> >    120.000000
>>>> > mds                         advanced  mds_heartbeat_reset_grace
>>>> >    7400
>>>> > mds                         advanced  mds_tick_interval
>>>> >    3.000000
>>>> >
>>>> >
>>>> > [image: ariadne.ai Logo] Lars Köppel
>>>> > Developer
>>>> > Email: lars.koeppel@xxxxxxxxxx
>>>> > Phone: +49 6221 5993580 <+4962215993580>
>>>> > ariadne.ai (Germany) GmbH
>>>> > Häusserstraße 3, 69115 Heidelberg
>>>> > Amtsgericht Mannheim, HRB 744040
>>>> > Geschäftsführer: Dr. Fabian Svara
>>>> > https://ariadne.ai
>>>> >
>>>> >
>>>> > On Tue, Jun 11, 2024 at 2:05 PM Eugen Block <eblock@xxxxxx> wrote:
>>>> >
>>>> >> Hi,
>>>> >>
>>>> >> can you check if this thread [1] applies to your situation? You don't
>>>> >> have multi-active MDS enabled, but maybe it's still some journal
>>>> >> trimming, or maybe misbehaving clients? In your first post there were
>>>> >> health warnings regarding cache pressure and cache size. Are those
>>>> >> resolved?
>>>> >>
>>>> >> [1]
>>>> >>
>>>> >>
>>>> https://lists.ceph.io/hyperkitty/list/ceph-users@xxxxxxx/thread/7U27L27FHHPDYGA6VNNVWGLTXCGP7X23/#VOOV235D4TP5TEOJUWHF4AVXIOTHYQQE
>>>> >>
>>>> >> Zitat von Lars Köppel <lars.koeppel@xxxxxxxxxx>:
>>>> >>
>>>> >> > Hello everyone,
>>>> >> >
>>>> >> > short update to this problem.
>>>> >> > The zapped OSD is rebuilt and it has now 1.9 TiB (the expected size
>>>> >> ~50%).
>>>> >> > The other 2 OSDs are now at 2.8 respectively 3.2 TiB. They jumped up
>>>> and
>>>> >> > down a lot but the higher one has now also reached 'nearfull'
>>>> status. How
>>>> >> > is this possible? What is going on?
>>>> >> >
>>>> >> > Does anyone have a solution how to fix this without zapping the OSD?
>>>> >> >
>>>> >> > Best regards,
>>>> >> > Lars
>>>> >> >
>>>> >> >
>>>> >> > [image: ariadne.ai Logo] Lars Köppel
>>>> >> > Developer
>>>> >> > Email: lars.koeppel@xxxxxxxxxx
>>>> >> > Phone: +49 6221 5993580 <+4962215993580>
>>>> >> > ariadne.ai (Germany) GmbH
>>>> >> > Häusserstraße 3, 69115 Heidelberg
>>>> >> > Amtsgericht Mannheim, HRB 744040
>>>> >> > Geschäftsführer: Dr. Fabian Svara
>>>> >> > https://ariadne.ai
>>>> >> > _______________________________________________
>>>> >> > ceph-users mailing list -- ceph-users@xxxxxxx
>>>> >> > To unsubscribe send an email to ceph-users-leave@xxxxxxx
>>>> >>
>>>> >>
>>>> >> _______________________________________________
>>>> >> ceph-users mailing list -- ceph-users@xxxxxxx
>>>> >> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>>>> >>
>>>> 
>>>> 
>>>> 
>>>> 
> 
> 
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx