Hi Wido, Are your mon's using rocksdb or still leveldb? Are your mon stores trimming back to a small size after HEALTH_OK was restored? One v12.2.2 cluster here just started showing the "is using a lot of disk space" warning on one of our mons. In fact all three mons are now using >16GB. I tried compacting and resyncing an empty mon but those don't trim anything -- there really is 16GB of data mon store for this healthy cluster. (The mon's on this cluster were using ~560MB before updating to luminous back in December.) Any thoughts? Cheers, Dan On Sat, Feb 3, 2018 at 4:50 PM, Wido den Hollander <wido@xxxxxxxx> wrote: > Hi, > > I just wanted to inform people about the fact that Monitor databases can > grow quite big when you have a large cluster which is performing a very long > rebalance. > > I'm posting this on ceph-users and ceph-large as it applies to both, but > you'll see this sooner on a cluster with a lof of OSDs. > > Some information: > > - Version: Luminous 12.2.2 > - Number of OSDs: 2175 > - Data used: ~2PB > > We are in the middle of migrating from FileStore to BlueStore and this is > causing a lot of PGs to backfill at the moment: > > 33488 active+clean > 4802 active+undersized+degraded+remapped+backfill_wait > 1670 active+remapped+backfill_wait > 263 active+undersized+degraded+remapped+backfilling > 250 active+recovery_wait+degraded > 54 active+recovery_wait+degraded+remapped > 27 active+remapped+backfilling > 13 active+recovery_wait+undersized+degraded+remapped > 2 active+recovering+degraded > > This has been running for a few days now and it has caused this warning: > > MON_DISK_BIG mons > srv-zmb03-05,srv-zmb04-05,srv-zmb05-05,srv-zmb06-05,srv-zmb07-05 are using a > lot of disk space > mon.srv-zmb03-05 is 31666 MB >= mon_data_size_warn (15360 MB) > mon.srv-zmb04-05 is 31670 MB >= mon_data_size_warn (15360 MB) > mon.srv-zmb05-05 is 31670 MB >= mon_data_size_warn (15360 MB) > mon.srv-zmb06-05 is 31897 MB >= mon_data_size_warn (15360 MB) > mon.srv-zmb07-05 is 31891 MB >= mon_data_size_warn (15360 MB) > > This is to be expected as MONs do not trim their store if one or more PGs is > not active+clean. > > In this case we expected this and the MONs are each running on a 1TB Intel > DC-series SSD to make sure we do not run out of space before the backfill > finishes. > > The cluster is spread out over racks and in CRUSH we replicate over racks. > Rack by rack we are wiping/destroying the OSDs and bringing them back as > BlueStore OSDs and letting the backfill handle everything. > > In between we wait for the cluster to become HEALTH_OK (all PGs > active+clean) so that the Monitors can trim their database before we start > with the next rack. > > I just want to warn and inform people about this. Under normal circumstances > a MON database isn't that big, but if you have a very long period of > backfills/recoveries and also have a large number of OSDs you'll see the DB > grow quite big. > > This has improved significantly going to Jewel and Luminous, but it is still > something to watch out for. > > Make sure your MONs have enough free space to handle this! > > Wido > > > > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com