OK we did get the "pgremapper" installed and are starting to use it. We got some warnings when trying to rebalance an entire host bucket, saying some of our pgs have mismatched lengths and then "nothing to do". Switching to individual "drain" operations now to see if that helps. ________________________________ From: Stefan Kooman <stefan@xxxxxx> Sent: Wednesday, September 7, 2022 11:34 AM To: Wyll Ingersoll <wyllys.ingersoll@xxxxxxxxxxxxxx>; Gregory Farnum <gfarnum@xxxxxxxxxx> Cc: ceph-users@xxxxxxx <ceph-users@xxxxxxx> Subject: Re: Re: data usage growing despite data being written On 9/7/22 16:38, Wyll Ingersoll wrote: > I'm sure we probably have but I'm not sure what else to do. We are desperate to get data off of these 99%+ OSDs and the cluster by itself isn't doing it. > > The crushmap appears ok. we have replicated pools and a large EC pool, all are using host-based failure domains. The new osds on the newly added hosts are slowly filling, just not as much as we expected. > > We have far too many osds at 99%+ and they continue to fill up. How do we remove the excess OSDMap data, is it even possible? > > If we shouldn't be migrating PGs and we cannot remove data, what are our options to get it to balance again and stop filling up with OSDMaps and other internal ceph data? Have you tried with the tools I mentioned before in the other thread? To drain PGs from one OSD to another, you can use the pgremmaper tool for example [1]. Good to know: The *fullest* OSD determines how full the cluster is, and how much space you have available. If OSDs get fuller and fuller ... than it might appear that data is written, but in reality it's not. So you can "fill" up a cluster without writing data to it. Gr. Stefan [1]: https://github.com/digitalocean/pgremapper/#drain P.s. I guess you meant to say "despite *no* data being written", right? _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx