Re: weird outage of ceph

Alwin Antreich <alwin.antreich@xxxxxxxx> · Fri, 16 Aug 2024 20:50:14 +0200

Hi Simon,

On Fri, Aug 16, 2024, 11:14 Simon Oosthoek <simon.oosthoek@xxxxxxxxx> wrote:

> Hi
>
> We had a really weird outage today of ceph and I wonder how it came about.
> The problem seems to have started around midnight, I still need to look if
> it was to the extend I found it in this morning or if it grew more
> gradually, but when I found it several osd servers had most or all osd
> processes down, to the point where our EC 8+3 buckets didn't work anymore.
>
> Restarting the servers or the services turned out to be the way to quickly
> recover from this.
>
> I see some of our OSDs are coming close to (but not quite) 80-85% full,
> There are many times when I've seen an overfull error lead to cascading and
> catastrophic failures. I suspect this may have been one of them.
>
> Which brings me to another question, why is our balancer doing so badly at
> balancing the OSDs? It's configured with upmap mode and it should work
> great with the amount of PGs per OSD we have, but it is letting some OSD's
> reach 80% full and others not yet 50% full (we're just over 61% full in
> total).
>
> The current health status is:
> HEALTH_WARN Low space hindering backfill (add storage if this doesn't
> resolve itself): 1 pg backfill_toofull
> [WRN] PG_BACKFILL_FULL: Low space hindering backfill (add storage if this
> doesn't resolve itself): 1 pg backfill_toofull
>    pg 30.3fc is active+remapped+backfill_wait+backfill_toofull, acting
> [66,105,124,113,89,132,206,242,179]
>
> I've started reweighting again, because the balancer is not doing it's job
> in our cluster for some reason...
>
Do you mean, ceph osd reweight or ceph osd crush reweight?
These two are doing two different things. And if, only use the latter. As
the odd reweight will only move data temporarily within a bucket (node). It
is used to in/out OSDs mainly. And it will screw over the algorithm for
placing PGs. Which should also affect the balancer.

You can use the pg-upmap-items command to manually upmap PGs. The tool Alex
recommended is doing this under the hood.

Cheers,
Alwin
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx