On Tue, Sep 6, 2022 at 2:08 PM Wyll Ingersoll <wyllys.ingersoll@xxxxxxxxxxxxxx> wrote: > > > Our cluster has not had any data written to it externally in several weeks, but yet the overall data usage has been growing. > Is this due to heavy recovery activity? If so, what can be done (if anything) to reduce the data generated during recovery. > > We've been trying to move PGs away from high-usage OSDS (many over 99%), but it's like playing whack-a-mole, the cluster keeps sending new data to already overly full osds making further recovery nearly impossible. I may be missing something, but I think you've really slowed things down by continually migrating PGs around while the cluster is already unhealthy. It forces a lot of new OSDMap generation and general churn (which itself slows down data movement.) I'd also examine your crush map carefully, since it sounded like you'd added some new hosts and they weren't getting the data you expected them to. Perhaps there's some kind of imbalance (eg, they aren't in racks, and selecting those is part of your crush rule?). -Greg > > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx > _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx