I really don't think those options will impact anything; what's likely going on is that because things are dirty, they need to keep a long history around to do their peering. But I haven't had to deal with that in a while, so maybe I'm missing something. On Wed, Sep 7, 2022 at 8:36 AM Wyll Ingersoll <wyllys.ingersoll@xxxxxxxxxxxxxx> wrote: > > > Can we tweak the osdmap pruning parameters to be more aggressive about trimming those osdmaps? Would that reduce data on the OSDs or only on the MON DB? > Looking at mon_min_osdmpa_epochs (500) and mon_osdmap_full_prune_min (10000). > > Is there a way to find out how many osdmaps are currently being kept? > ________________________________ > From: Gregory Farnum <gfarnum@xxxxxxxxxx> > Sent: Wednesday, September 7, 2022 10:58 AM > To: Wyll Ingersoll <wyllys.ingersoll@xxxxxxxxxxxxxx> > Cc: ceph-users@xxxxxxx <ceph-users@xxxxxxx> > Subject: Re: data usage growing despite data being written > > On Wed, Sep 7, 2022 at 7:38 AM Wyll Ingersoll > <wyllys.ingersoll@xxxxxxxxxxxxxx> wrote: > > > > I'm sure we probably have but I'm not sure what else to do. We are desperate to get data off of these 99%+ OSDs and the cluster by itself isn't doing it. > > > > The crushmap appears ok. we have replicated pools and a large EC pool, all are using host-based failure domains. The new osds on the newly added hosts are slowly filling, just not as much as we expected. > > > > We have far too many osds at 99%+ and they continue to fill up. How do we remove the excess OSDMap data, is it even possible? > > > > If we shouldn't be migrating PGs and we cannot remove data, what are our options to get it to balance again and stop filling up with OSDMaps and other internal ceph data? > > Well, you can turn things off, figure out the proper mapping, and use > the ceph-objectstore-tool to migrate PGs to their proper destinations > (letting the cluster clean up the excess copies if you can afford to — > deleting things is always scary). > But I haven't had to help recover a death-looping cluster in around a > decade, so that's about all the options I can offer up. > -Greg > > > > > > thanks! > > > > > > > > ________________________________ > > From: Gregory Farnum <gfarnum@xxxxxxxxxx> > > Sent: Wednesday, September 7, 2022 10:01 AM > > To: Wyll Ingersoll <wyllys.ingersoll@xxxxxxxxxxxxxx> > > Cc: ceph-users@xxxxxxx <ceph-users@xxxxxxx> > > Subject: Re: data usage growing despite data being written > > > > On Tue, Sep 6, 2022 at 2:08 PM Wyll Ingersoll > > <wyllys.ingersoll@xxxxxxxxxxxxxx> wrote: > > > > > > > > > Our cluster has not had any data written to it externally in several weeks, but yet the overall data usage has been growing. > > > Is this due to heavy recovery activity? If so, what can be done (if anything) to reduce the data generated during recovery. > > > > > > We've been trying to move PGs away from high-usage OSDS (many over 99%), but it's like playing whack-a-mole, the cluster keeps sending new data to already overly full osds making further recovery nearly impossible. > > > > I may be missing something, but I think you've really slowed things > > down by continually migrating PGs around while the cluster is already > > unhealthy. It forces a lot of new OSDMap generation and general churn > > (which itself slows down data movement.) > > > > I'd also examine your crush map carefully, since it sounded like you'd > > added some new hosts and they weren't getting the data you expected > > them to. Perhaps there's some kind of imbalance (eg, they aren't in > > racks, and selecting those is part of your crush rule?). > > -Greg > > > > > > > > _______________________________________________ > > > ceph-users mailing list -- ceph-users@xxxxxxx > > > To unsubscribe send an email to ceph-users-leave@xxxxxxx > > > > > > _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx