Re: data usage growing despite data being written

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I'm sure we probably have but I'm not sure what else to do.  We are desperate to get data off of these 99%+ OSDs and the cluster by itself isn't doing it.

The crushmap appears ok.  we have replicated pools and a large EC pool, all are using host-based failure domains.  The new osds on the newly added hosts are slowly filling, just not as much as we expected.

We have far too many osds at 99%+ and they continue to fill up.  How do we remove the excess OSDMap data, is it even possible?

If we shouldn't be migrating PGs and we cannot remove data, what are our options to get it to balance again and stop filling up with OSDMaps and other internal ceph data?

thanks!



________________________________
From: Gregory Farnum <gfarnum@xxxxxxxxxx>
Sent: Wednesday, September 7, 2022 10:01 AM
To: Wyll Ingersoll <wyllys.ingersoll@xxxxxxxxxxxxxx>
Cc: ceph-users@xxxxxxx <ceph-users@xxxxxxx>
Subject: Re:  data usage growing despite data being written

On Tue, Sep 6, 2022 at 2:08 PM Wyll Ingersoll
<wyllys.ingersoll@xxxxxxxxxxxxxx> wrote:
>
>
> Our cluster has not had any data written to it externally in several weeks, but yet the overall data usage has been growing.
> Is this due to heavy recovery activity?  If so, what can be done (if anything) to reduce the data generated during recovery.
>
> We've been trying to move PGs away from high-usage OSDS (many over 99%), but it's like playing whack-a-mole, the cluster keeps sending new data to already overly full osds making further recovery nearly impossible.

I may be missing something, but I think you've really slowed things
down by continually migrating PGs around while the cluster is already
unhealthy. It forces a lot of new OSDMap generation and general churn
(which itself slows down data movement.)

I'd also examine your crush map carefully, since it sounded like you'd
added some new hosts and they weren't getting the data you expected
them to. Perhaps there's some kind of imbalance (eg, they aren't in
racks, and selecting those is part of your crush rule?).
-Greg

>
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux