Re: data usage growing despite data being written

Gregory Farnum <gfarnum@xxxxxxxxxx> · Wed, 7 Sep 2022 07:01:57 -0700

On Tue, Sep 6, 2022 at 2:08 PM Wyll Ingersoll
<wyllys.ingersoll@xxxxxxxxxxxxxx> wrote:
>
>
> Our cluster has not had any data written to it externally in several weeks, but yet the overall data usage has been growing.
> Is this due to heavy recovery activity?  If so, what can be done (if anything) to reduce the data generated during recovery.
>
> We've been trying to move PGs away from high-usage OSDS (many over 99%), but it's like playing whack-a-mole, the cluster keeps sending new data to already overly full osds making further recovery nearly impossible.

I may be missing something, but I think you've really slowed things
down by continually migrating PGs around while the cluster is already
unhealthy. It forces a lot of new OSDMap generation and general churn
(which itself slows down data movement.)

I'd also examine your crush map carefully, since it sounded like you'd
added some new hosts and they weren't getting the data you expected
them to. Perhaps there's some kind of imbalance (eg, they aren't in
racks, and selecting those is part of your crush rule?).
-Greg

>
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx