If you have: * pg_num too low (defaults are too low) * pg_num not a power of 2 * pg_num != number of OSDs in the pool * balancer not enabled any of those might result in imbalance. > On Jun 12, 2024, at 07:33, Eugen Block <eblock@xxxxxx> wrote: > > I don't have any good explanation at this point. Can you share some more information like: > > ceph pg ls-by-pool <cephfs_metadata> > ceph osd df (for the relevant OSDs) > ceph df > > Thanks, > Eugen > > Zitat von Lars Köppel <lars.koeppel@xxxxxxxxxx>: > >> Since my last update the size of the largest OSD increased by 0.4 TiB while >> the smallest one only increased by 0.1 TiB. How is this possible? >> >> Because the metadata pool reported to have only 900MB space left, I stopped >> the hot-standby MDS. This gave me 8GB back but these filled up in the last >> 2h. >> I think I have to zap the next OSD because the filesystem is getting read >> only... >> >> How is it possible that an OSD has over 1 TiB less data on it after a >> rebuild? And how is it possible to have so different sizes of OSDs? >> >> >> [image: ariadne.ai Logo] Lars Köppel >> Developer >> Email: lars.koeppel@xxxxxxxxxx >> Phone: +49 6221 5993580 <+4962215993580> >> ariadne.ai (Germany) GmbH >> Häusserstraße 3, 69115 Heidelberg >> Amtsgericht Mannheim, HRB 744040 >> Geschäftsführer: Dr. Fabian Svara >> https://ariadne.ai >> >> >> On Tue, Jun 11, 2024 at 3:47 PM Lars Köppel <lars.koeppel@xxxxxxxxxx> wrote: >> >>> Only in warning mode. And there were no PG splits or merges in the last 2 >>> month. >>> >>> >>> [image: ariadne.ai Logo] Lars Köppel >>> Developer >>> Email: lars.koeppel@xxxxxxxxxx >>> Phone: +49 6221 5993580 <+4962215993580> >>> ariadne.ai (Germany) GmbH >>> Häusserstraße 3, 69115 Heidelberg >>> Amtsgericht Mannheim, HRB 744040 >>> Geschäftsführer: Dr. Fabian Svara >>> https://ariadne.ai >>> >>> >>> On Tue, Jun 11, 2024 at 3:32 PM Eugen Block <eblock@xxxxxx> wrote: >>> >>>> I don't think scrubs can cause this. Do you have autoscaler enabled? >>>> >>>> Zitat von Lars Köppel <lars.koeppel@xxxxxxxxxx>: >>>> >>>> > Hi, >>>> > >>>> > thank you for your response. >>>> > >>>> > I don't think this thread covers my problem, because the OSDs for the >>>> > metadata pool fill up at different rates. So I would think this is no >>>> > direct problem with the journal. >>>> > Because we had earlier problems with the journal I changed some >>>> > settings(see below). I already restarted all MDS multiple times but no >>>> > change here. >>>> > >>>> > The health warnings regarding cache pressure resolve normally after a >>>> > short period of time, when the heavy load on the client ends. Sometimes >>>> it >>>> > stays a bit longer because an rsync is running and copying data on the >>>> > cluster(rsync is not good at releasing the caps). >>>> > >>>> > Could it be a problem if scrubs run most of the time in the background? >>>> Can >>>> > this block any other tasks or generate new data itself? >>>> > >>>> > Best regards, >>>> > Lars >>>> > >>>> > >>>> > global basic mds_cache_memory_limit >>>> > 17179869184 >>>> > global advanced mds_max_caps_per_client >>>> > 16384 >>>> > global advanced >>>> mds_recall_global_max_decay_threshold >>>> > 262144 >>>> > global advanced mds_recall_max_decay_rate >>>> > 1.000000 >>>> > global advanced mds_recall_max_decay_threshold >>>> > 262144 >>>> > mds advanced mds_cache_trim_threshold >>>> > 131072 >>>> > mds advanced mds_heartbeat_grace >>>> > 120.000000 >>>> > mds advanced mds_heartbeat_reset_grace >>>> > 7400 >>>> > mds advanced mds_tick_interval >>>> > 3.000000 >>>> > >>>> > >>>> > [image: ariadne.ai Logo] Lars Köppel >>>> > Developer >>>> > Email: lars.koeppel@xxxxxxxxxx >>>> > Phone: +49 6221 5993580 <+4962215993580> >>>> > ariadne.ai (Germany) GmbH >>>> > Häusserstraße 3, 69115 Heidelberg >>>> > Amtsgericht Mannheim, HRB 744040 >>>> > Geschäftsführer: Dr. Fabian Svara >>>> > https://ariadne.ai >>>> > >>>> > >>>> > On Tue, Jun 11, 2024 at 2:05 PM Eugen Block <eblock@xxxxxx> wrote: >>>> > >>>> >> Hi, >>>> >> >>>> >> can you check if this thread [1] applies to your situation? You don't >>>> >> have multi-active MDS enabled, but maybe it's still some journal >>>> >> trimming, or maybe misbehaving clients? In your first post there were >>>> >> health warnings regarding cache pressure and cache size. Are those >>>> >> resolved? >>>> >> >>>> >> [1] >>>> >> >>>> >> >>>> https://lists.ceph.io/hyperkitty/list/ceph-users@xxxxxxx/thread/7U27L27FHHPDYGA6VNNVWGLTXCGP7X23/#VOOV235D4TP5TEOJUWHF4AVXIOTHYQQE >>>> >> >>>> >> Zitat von Lars Köppel <lars.koeppel@xxxxxxxxxx>: >>>> >> >>>> >> > Hello everyone, >>>> >> > >>>> >> > short update to this problem. >>>> >> > The zapped OSD is rebuilt and it has now 1.9 TiB (the expected size >>>> >> ~50%). >>>> >> > The other 2 OSDs are now at 2.8 respectively 3.2 TiB. They jumped up >>>> and >>>> >> > down a lot but the higher one has now also reached 'nearfull' >>>> status. How >>>> >> > is this possible? What is going on? >>>> >> > >>>> >> > Does anyone have a solution how to fix this without zapping the OSD? >>>> >> > >>>> >> > Best regards, >>>> >> > Lars >>>> >> > >>>> >> > >>>> >> > [image: ariadne.ai Logo] Lars Köppel >>>> >> > Developer >>>> >> > Email: lars.koeppel@xxxxxxxxxx >>>> >> > Phone: +49 6221 5993580 <+4962215993580> >>>> >> > ariadne.ai (Germany) GmbH >>>> >> > Häusserstraße 3, 69115 Heidelberg >>>> >> > Amtsgericht Mannheim, HRB 744040 >>>> >> > Geschäftsführer: Dr. Fabian Svara >>>> >> > https://ariadne.ai >>>> >> > _______________________________________________ >>>> >> > ceph-users mailing list -- ceph-users@xxxxxxx >>>> >> > To unsubscribe send an email to ceph-users-leave@xxxxxxx >>>> >> >>>> >> >>>> >> _______________________________________________ >>>> >> ceph-users mailing list -- ceph-users@xxxxxxx >>>> >> To unsubscribe send an email to ceph-users-leave@xxxxxxx >>>> >> >>>> >>>> >>>> >>>> > > > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx