On Sat, 3 Feb 2018, Wido den Hollander wrote: > Hi, > > I just wanted to inform people about the fact that Monitor databases can grow > quite big when you have a large cluster which is performing a very long > rebalance. > > I'm posting this on ceph-users and ceph-large as it applies to both, but > you'll see this sooner on a cluster with a lof of OSDs. > > Some information: > > - Version: Luminous 12.2.2 > - Number of OSDs: 2175 > - Data used: ~2PB > > We are in the middle of migrating from FileStore to BlueStore and this is > causing a lot of PGs to backfill at the moment: > > 33488 active+clean > 4802 active+undersized+degraded+remapped+backfill_wait > 1670 active+remapped+backfill_wait > 263 active+undersized+degraded+remapped+backfilling > 250 active+recovery_wait+degraded > 54 active+recovery_wait+degraded+remapped > 27 active+remapped+backfilling > 13 active+recovery_wait+undersized+degraded+remapped > 2 active+recovering+degraded > > This has been running for a few days now and it has caused this warning: > > MON_DISK_BIG mons > srv-zmb03-05,srv-zmb04-05,srv-zmb05-05,srv-zmb06-05,srv-zmb07-05 are using a > lot of disk space > mon.srv-zmb03-05 is 31666 MB >= mon_data_size_warn (15360 MB) > mon.srv-zmb04-05 is 31670 MB >= mon_data_size_warn (15360 MB) > mon.srv-zmb05-05 is 31670 MB >= mon_data_size_warn (15360 MB) > mon.srv-zmb06-05 is 31897 MB >= mon_data_size_warn (15360 MB) > mon.srv-zmb07-05 is 31891 MB >= mon_data_size_warn (15360 MB) > > This is to be expected as MONs do not trim their store if one or more PGs is > not active+clean. > > In this case we expected this and the MONs are each running on a 1TB Intel > DC-series SSD to make sure we do not run out of space before the backfill > finishes. > > The cluster is spread out over racks and in CRUSH we replicate over racks. > Rack by rack we are wiping/destroying the OSDs and bringing them back as > BlueStore OSDs and letting the backfill handle everything. > > In between we wait for the cluster to become HEALTH_OK (all PGs active+clean) > so that the Monitors can trim their database before we start with the next > rack. > > I just want to warn and inform people about this. Under normal circumstances a > MON database isn't that big, but if you have a very long period of > backfills/recoveries and also have a large number of OSDs you'll see the DB > grow quite big. > > This has improved significantly going to Jewel and Luminous, but it is still > something to watch out for. > > Make sure your MONs have enough free space to handle this! Yes! Just a side note that Joao has an elegant fix for this that allows the mon to trim most of the space-consuming full osdmaps. It's still work in progress but is likely to get backported to luminous. sage _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com