Re: weird outage of ceph

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Simon,

On Fri, Aug 16, 2024, 11:14 Simon Oosthoek <simon.oosthoek@xxxxxxxxx> wrote:

> Hi
>
> We had a really weird outage today of ceph and I wonder how it came about.
> The problem seems to have started around midnight, I still need to look if
> it was to the extend I found it in this morning or if it grew more
> gradually, but when I found it several osd servers had most or all osd
> processes down, to the point where our EC 8+3 buckets didn't work anymore.
>
> Restarting the servers or the services turned out to be the way to quickly
> recover from this.
>
> I see some of our OSDs are coming close to (but not quite) 80-85% full,
> There are many times when I've seen an overfull error lead to cascading and
> catastrophic failures. I suspect this may have been one of them.
>
> Which brings me to another question, why is our balancer doing so badly at
> balancing the OSDs? It's configured with upmap mode and it should work
> great with the amount of PGs per OSD we have, but it is letting some OSD's
> reach 80% full and others not yet 50% full (we're just over 61% full in
> total).
>
> The current health status is:
> HEALTH_WARN Low space hindering backfill (add storage if this doesn't
> resolve itself): 1 pg backfill_toofull
> [WRN] PG_BACKFILL_FULL: Low space hindering backfill (add storage if this
> doesn't resolve itself): 1 pg backfill_toofull
>    pg 30.3fc is active+remapped+backfill_wait+backfill_toofull, acting
> [66,105,124,113,89,132,206,242,179]
>
> I've started reweighting again, because the balancer is not doing it's job
> in our cluster for some reason...
>
Do you mean, ceph osd reweight or ceph osd crush reweight?
These two are doing two different things. And if, only use the latter. As
the odd reweight will only move data temporarily within a bucket (node). It
is used to in/out OSDs mainly. And it will screw over the algorithm for
placing PGs. Which should also affect the balancer.

You can use the pg-upmap-items command to manually upmap PGs. The tool Alex
recommended is doing this under the hood.

Cheers,
Alwin
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux